巴西专利BR112016008358A2 Combined bi-predictive fusion candidates for 3d video encoding

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
CANDIDATES TO BI-PREDICTIVE FUSIONCOMBINED FOR 3D VIDEO ENCODING. A video encoder generates a list of merge candidates to encode a video block of datathree-dimensional (3D) video clips. The maximum number of merger candidates in thelist of merger candidates can be equal to 6. As part of the generation of list of merge candidates, the video encoder determines whether thenumber of merger candidates in the merger candidate list is less than5. If this is the case, the video encoder derives one or morecandidates for combined bi-predictive fusion. The video encoder includes the candidate or candidates for the combined bi-predictive merger in the list ofcandidates for merger.
公开号:BR112016008358A2
申请号:R112016008358-0
申请日:2014-09-19
公开日:2021-08-03
发明作者:Li Zhang；Ying Chen
申请人:Qualcomm Incorporated；
IPC主号:

专利说明:

[0001] [0001] This application claims the benefit of U.S. Provisional Patent Application No. 61/880,737, filed September 20, 2013, the entire contents of which are incorporated herein by reference. TECHNICAL FIELD
[0002] [0002] This disclosure relates to video encoding and compression, and more specifically to encoding techniques that can be used in three-dimensional (3D) video encoding. BACKGROUND
[0003] [0003] Digital video capabilities can be incorporated into a wide range of devices, which include digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAS), laplop or desktop computers, digital cameras, digital recording apparatus, digital media players, video game apparatus, video game consoles, cellular or satellite radio telephones, video teleconferencing apparatus and the like. Digital video devices implement video compression techniques such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, Part 10, Encoding Advanced Video Coding (AVC), the High Efficiency Video Coding (EEVC) standard, and extensions of such standards, to more efficiently transmit, receive, and store digital video information.
[0004] [0004] Video compression techniques perform spatial (intra-picture) and/or temporal (inter-pic) prediction to reduce or remove the inherent redundancy of video sequences. For block-based video encoding, a video slice can be partitioned into video blocks which can also be “referred to as tree blocks, encoding units (CUs) and/or encoding nodes. Video bios in an intra-encoded slice (1) of an image are encoded using spatial prediction with respect to reference samples in neighboring bios in the same image. Video blocks in an inter-coded (P or B) slice of an image can use spatial prediction with respect to reference samples in neighboring blocks in the same image or slice or temporal prediction with respect to reference samples in other reference images . Images can be referred to as frames, and reference images can be referred to as reference frames.
[0005] [0005] A multi-view encoding bitstream can be generated by encoding views, from various perspectives, for example. Encoding multiple views can allow a decoder to choose between different views or possibly render multiple views. Furthermore, some three-dimensional (3D) video techniques and standards that have been developed or are under development make use of multi-view coding aspects. Three-dimensional video is also referred to as “3DV”.
[0006] [0006] For example, different views can transmit left and right eye views to support 3D video. Alternatively, some 3D coding processes can apply so-called multi-view coding for more depth. When encoding multiple views for more depth, a 3D bitstream can contain not only texture view components, but also depth view components. For example, each view can
[0007] [0007] Currently, a VCEG and MPEG 3D Video Coding Joint Collaboration Team (JCT-3C) is developing a 3D video standard based on the emerging standard referred to as “high-efficiency video coding (HEVC)”, for which part of the standardization efforts includes standardizing the multi-view video codec based on HEVC (MV-HEVC) and another varte for 3D video coding based on HEVC (3D-HEVC). 3D-HZVC may include and support new coding tools, including those at the coding unit/prediction unit level, for both texture and depth views. SUMMARY
[10008] [10008] In general, this disclosure relates to three-dimensional (3D) video encoding based on advanced codecs, including encoding two or more views with the High Efficiency Video Coding (HEVC) - 3D codec. For example, some examples of this disclosure describe techniques relating to candidates for combined bi-predictive fusion. In some of such examples, as part of generating a list of merge candidates, a video encoder determines whether the number of merge candidates in the list is less than 5. If this is the case, the video encoder derives one or more candidates for combined bi-predictive fusion. The video encoder includes the combined bi-predictive fusion candidate or candidates in the fusion candidate list.
[0009] [0009] In one aspect, this disclosure describes a method for encoding 3D video data. The method comprises generating a list of merge candidates to encode a video block of 3D video data. The maximum number of merger candidates in the merger candidate list is equal to 6 and generating the merger candidate list comprises: determining whether the c number of merger candidates in the merger candidate list is less than 5; and, in response, to the determination that the number of merger candidates on the list of merger candidates is less than 5: derive one or more combined bi-predictive merger candidates, where each respective combined bi-predictive merger candidate from bi-predictive merge candidate or candidates matches a respective merge candidate pair already on the merge candidate list, where the respective merged bi-predictive merge candidate is a combination of a motion vector of a first merge candidate fusion of a respective pair and a motion vector of a second candidate fusion of the respective pair, where the motion vector of the first fusion candidate and the motion vector of the second fusion candidate refer to images in image lists different reference points. The method also comprises including the combined bi-predictive merger candidate or candidates in the list of merger candidates.
[0010] [0010] In another aspect, this disclosure describes a video encoding apparatus comprising a data storage medium configured to store 3D video data; and one or more processors configured to: generate a fusion candidate list to encode a video block of the 3D video data, in which the maximum number of fusion candidates in the fusion candidate list is equal to 6 and as part from the generation of the merger candidate list, the processor or processors: determines whether the number of merger candidates in the merger candidate list is less than 5; and, in response to the determination that a number of merger candidates in the list of merger candidates is less than 5: derive one or more combined bi-predictive merger candidates, in which each respective candidate's combined bi-predictive merger candidate or bi-predictive blended fusion candidates matches a respective pair of fusion candidates already on the fusion candidate list, where the respective blended bi-predictive fusion candidate is a combination of a motion vector of a first fusion candidate of the respective pair and a motion vector of the second fusion candidate of the respective pair where the motion vector of the first fusion candidate and the motion vector of the second fusion candidate refer to images in different reference imager lists. The processor or processors are configured to include the merger candidate or candidates in the list of merger candidates.
[0011] [0011] In another aspect, this disclosure describes a video encoding apparatus comprising: an apparatus for generating a list of merge candidates for encoding a video block of 3D video data. The maximum number of fusion candidates in the fusion candidate list is equal to 6 and the provision for generating the fusion candidate list comprises: a provision for determining whether the number of fusion candidates in the fusion candidate list is less than 5; a device to derive, in response to the determination that a number of fusion candidates in the list of fusion candidates is less than 5, one or more combined bi-predictive fusion candidates, in which each respective combined bi-predictive fusion candidate of the combined bi-predictive merger candidate or candidates matches a respective merger candidate pair already on the merger candidate list, where the respective combined bi-predictive merger candidate is a combination of a motion vector of a first candidate the fusion of the respective pair and a motion vector of the second fusion candidate of the respective pair where the motion vector of the first fusion candidate and the motion vector of the second fusion candidate refer to images in reference image lists many different. The video encoding apparatus also comprises a device for including the combined bi-predictive fusion candidate or candidates in the list of fusion candidates.
[0012] [0012] in another aspect, this disclosure describes a computer-readable data storage medium that has instructions stored therein, which, when executed, cause a video encoding apparatus for 3D video data, as instructions making the video encoding apparatus: generate a list of merge candidates to encode a video block of the 3D video data. The maximum number of merger candidates in the merger candidate list is equal to 6. Generating the merger candidate list comprises: determining whether the number of merger candidates in the merger candidate list is less than 5; and, in response to the determination that a number of merger candidates in the list of merger candidates is less than 5: derive one or more combined bi-predictive merger candidates, in which each respective candidate's combined bi-predictive merger candidate or bi-predictive blended fusion candidates matches a respective pair of fusion candidates already on the fusion candidate list, where the respective blended bi-predictive fusion candidate is a combination of a motion vector of a first fusion candidate of the respective pair and a motion vector of the second fusion candidate of the respective pair in which the motion vector of the first fusion candidate and The motion vector of the second fusion candidate refer to images in different reference image lists; and include the combined bi-predictive merger candidate or candidates on the merger candidate list.
[0013] [0013] Details of one or more examples are given in the accompanying drawings and in the description that follows. Other features, objects and advantages will be evident from the description, drawings and claims. BRIEF DESCRIPTION OF THE DRAWINGS
[0014] [0014] Figure 1 is a block diagram showing an exemplary video encoding system that can utilize the techniques of this disclosure.
[0015] [0015] Figure 2 is a conceptual illustration showing spatial neighbors that are potential candidates for a merge list.
[0016] [0016] Figure 3 is a conceptual diagram showing spatial and temporal neighbor blocks relative to the current condition unit,
[0017] [0017] Figure 4 shows an example of a process of derivation of a candidate motion vector predicted by inter-view.
[0018] [0018] Figure 5 is a conceptual diagram showing block depth derivation of a reference view to perform prediction by view-back-strain synthesis (PVSP).
[0019] [0019] Figure 6 is a conceptual diagram showing four corner pixels of the 8x8 depth block.
[0020] [0020] Figure 7 is a table that presents an exemplary specification of l0CandIdx and liCandIdx in 3D-HEVC.
[0021] [0021] Figure 8 is a block diagram showing an exemplary video encoder that can implement the techniques of this disclosure.
[0022] [0022] Figure 9 is a block diagram showing an exemplary video decoder that can implement the techniques of this disclosure.
[09023] [09023] Figure 10A is a ffowgram showing an exemplary operation of a video encoder to encode data associated with 3D video, in accordance with one or more techniques of this disclosure.
[0024] [0024] Figure 10B is a flowchart showing an exemplary operation of a video decoder to decode data associated with 3D video, in accordance with some techniques of this disclosure.
[0025] [0025] Figure 11 is a flowchart showing a first part of an exemplary operation to build a list of merger candidates, in accordance with one or more techniques of this disclosure.
[0026] [0026] Figure 12 is a flowchart showing a second part of the exemplary operation of Figure 11 to build a list of merge candidates for the current block, in accordance with one or more techniques of this disclosure.
[0027] [0027] Figure 13 is a flowchart showing an exemplary derivation process for candidates for combined bi-predictive fusion, in accordance with one or more techniques of this disclosure.
[0028] [0028] Figure 14A is a flowchart showing an exemplary operation of a video encoder to encode a block of video, in accordance with one or more techniques of this disclosure.
[0029] [0029] Figure 14B is a flowchart showing an exemplary operation of a video decoder to decode a block of video, in accordance with one or more techniques of this disclosure. DETAILED DESCRIPTION
[0030] [0030] Video encoding is a processor of transforming video data into encoded video data. In general, the video decoder reverses the transformation, thus rebuilding the video data. Video encoding and video decoding can both be referred to as video encoding. Block-based video encoding is a type of video encoding that works, at least in part, on blocks of video data within images.
[0032] [0032] In bidirectional inter-prediction, the video encoder determines two predictive blocks for the current block. Therefore, the video encoder also determines two motion vectors for the current block. The two predictive blocks for the current block can be in different reference images. Consequently, in bidirectional inter-prediction, the video encoder can determine two reference indices for the current block (i.e., a first reference index and a second reference index). The first and second reference indices indicate the locations of reference pictures within a first and second lists of reference pictures, respectively. Residual data for the current block can indicate differences between the current block and a synthesized predictive block that is based on the two predictive blocks for the current block.
[0033] [0033] The motion vectors of the current block can be similar to the motion vectors of blocks that are spatially or temporally neighbors to the current block (ie, neighboring blocks). Consequently, it may be unnecessary for a video encoder to explicitly signal the motion vectors and reference indices of the current block. Instead, the video encoder can determine a list of merge candidates for the current block (ie, a “merge candidate list”). Each of the merge candidates specifies a set of motion information (such as one or more motion vectors, one or more reference indices, etce.). The list of merge candidates may include one or more merge candidates that respectively specify movement information of blocks other than neighboring blocks. Neighbor blocks can include spatial neighbor blocks and/or temporal neighbor blocks. This revelation can refer to fusion candidates based on space neighbor blocks as space fusion candidates. This disclosure can refer to candidates for fusion with Dase in temporal neighbor blocks as candidates for temporal fusion. In some examples, two merge candidates in the merge candidate list might have identical movement information. The video encoder can select one of the merge candidates and can flag a syntax element that indicates a position within the merge candidate list of the selected merge candidate.
[0034] [0034] The video decoder can generate the same fusion candidate list (that is, a fusion candidate list that duplicates the fusion candidate list determined by the video encoder) and may determine, based on receipt of the element of syntax flagged, the selected merge candidate. the video decoder can then use the motion information of the selected merge candidate as the motion information of the current block. In this way, The current block can inherit the movement information from one of the neighboring blocks.
[0035] [0035] In some circumstances, the movement information of a neighboring block may not be available. For example, the neighboring block may be encoded using intra-prediction, the neighboring block may be in a different slice, or the neighboring block may simply not exist. Consequently, there may be less than the required number of merge candidates (the maximum number of merge candidates, which might be indicated in a slice header, for example) in the list of merge candidates for the current block. Therefore, when a video encoder (a video encoder or a video decoder, for example) generates the merge candidate list for the current block, the video encoder can ensure that the merge candidate list for the block current include the desired number of merger candidates by adding one or more artificial merger candidates to the list of merger candidates for the current bloc. Artificial merge candidates are merge candidates that do not necessarily specify the motion information of any neighboring spatial or temporal blocks.
[0036] [0036] Artificial fusion candidates may include one or more combined bi-predictive fusion candidates. As noted above, a merge candidate can specify two motion vectors and two reference indices. A combined bi-predictive merger candidate matches a respective pair of merger candidates already on the list of merger candidates for the current bloc. Specifically, the combined bi-predictive fusion candidate is a combination of motion vector and reference index of a first fusion candidate of the respective pair, if available, and motion vector and reference index of a second fusion candidate of the respective pair. pair if available. The first fusion candidate motion vector and the second fusion candidate motion vector refer to images in different reference image lists. Thus, blended bi-predictive fusion candidates may match different combinations of motion vectors/benchmarks of different existing fusion candidates (other fusion candidates than combined bi-predictive fusion candidates, such as special fusion candidates or temporal, for example). For example, when the RefPicList0 motion information of a first merge candidate and the RefPicListI motion information of a second merge candidate are both available and not identical (ie, either the reference images are different or the motion vectors are different) a combined bi-predictive fusion candidate is constructed a combined bi-predictive fusion candidate. Otherwise the respective next pair is checked.
[0037] [0037] In some versions of the HEVC specification, The maximum value of the required number of merger candidates in a list of merger candidates is 5. Also, in some cases, the desired number of merger candidates in a list of merger candidates merger is 5. Consequently, if there are fewer than 5 merger candidates in the merger candidate list before the combined bi-predictive merger candidate list is included in the merger candidate list, there are up to 12 (ie, 4*3) combinations possible motion vectors usable in candidates for combined bi-predictive fusion. The selection of a respective pair (that is, which candidate is the first candidate is the second candidate) is pre-defined in HEVC, as shown in the following table: | combrax [| 0 [1/2 [3 14/5/6178 [5 /10fu) [1ocanatax | o 1 [o 12 1 |2 10 j3 a a 213) Lucanatas | 1 o 2 [0/2 ]/1|3]0 3/17 /312) In the table above, l10CandIdx represents the index of the selected first existing merger candidate, 11CandIdx represents the index of the second existing merger candidate, and combIdx represents the Constructed combined bi-predictive candidate index.
[0038] [0038] Multilayer video encoding allows video encoding through multiple layers. Multilayer video encoding can be used to implement scalable video encoding, multilayer video encoding, and 3-dimensional (3D) video encoding. In multi-view video encoding and 3D video encoding, each of the layers can correspond to a different viewpoint. In some video encoding standards, the required number of merge candidates in a list of merge candidates is greater when using multi-layer video encoding than when using single-layer video encoding. Larger number of merge candidates can be allowed to accommodate merge candidates that specify block movement information in different views.
[0039] [0039] As in the case of single-layer video encoding, when a video encoding is using multi-layer encoding and the number of merge candidates in a list of merge candidates is less than the desired number of merge candidates, o video encoding can generate one or more candidates for combined bi-predictive fusion. However, due to the greater number of fusion candidates when using multilayer coding, there are a greater number of motion vector combinations usable in the combined bi-predictive fusion candidates. For example, if the required number of merge candidates is 6, there are up to 20 (5*4) possible combinations of motion vectors usable in the combined bi-predictive merge candidates.
[0040] [0040] A video encoder may not be able to generate a combined bi-predictive fusion candidate from specific pairs of fusion candidates. For example, the video encoder may not be able to generate a combined bi-predictive fusion candidate if one of the fusion candidates has only a single motion vector and a single reference index. In order to determine whether a combined bi-predictive fusion candidate can be generated from motion information of a specific fusion candidate pair, the video encoder may need to retrieve information about the fusion candidate pair from a memory.
[0041] [0041] The retrieval of information from memory can be a comparatively slow process compared to other encoding processes. Furthermore, memory access requires power. Therefore, limiting the number of memory accesses may be desirable. As the number of motion vector combinations usable in candidates for combined bi-predictive fusion increases, the amount of information that needs to be retrieved from memory increases. Thus, the increase in the number of fusion candidates required associated with encoding video from multiple views can significantly slow down the video encoding process and can use more energy than would otherwise be used.
[0041] [0041] Consequently, according to the example of this disclosure, a video encoder can generate a list of merge candidates to encode a 3D video block in a way that can limit memory accesses. Also, in this example, as part of the merge candidate list generation, the video encoder can determine if the number of merge candidates in the list is less than 5. In response to the determination that the number of merge candidates in the list is less than 5, the video encoding may derive one or more candidates for combined bi-predictive fusion. In this example, each respective combined bi-predictive merger candidate of the candidate or broadcast merger candidates matches a respective pair of merger candidates already on the list. Furthermore, in this example, the respective combined bi-predictive fusion candidate is a combination of motion vector of a first fusion candidate of the respective pair and motion vector of a second fusion candidate of the respective pair. In this example, the motion vector of the first fusion candidate and the motion vector of the second fusion candidate refer to images in different reference image lists. The video encoder can include the combined bi-predictive fusion candidate or candidates in the list. In some examples, the maximum number of merge candidates in the list is greater than 5 (equal to 6, for example). An effect of the process in this example is that the number of matches remains limited to 12, although the maximum number of merge candidates in the list is 6 or more. This can help speed up the encoding process by reducing the amount of information retrieved from memory and can also save energy.
[0043] [0043] Figure 1 is a block diagram showing an exemplary video encoding system 10 that can utilize the techniques of this disclosure. As described herein, the term "video encoder" generically refers to both video encoders and video decoders. In this disclosure, the terms "video encoding" and "encoding" can generically refer to video encoding or video decoding.
[0044] [0044] As shown in Figure 1, the video encoder system 10 includes a source apparatus 12 and a destination apparatus 14. The source apparatus 12 generates encoded video data. Therefore, the source apparatus 12 may be referred to as video encoding apparatus or video encoding equipment. The destination apparatus 14 can decode the encoded video data generated by the source apparatus 12. Therefore, the destination apparatus 14 can be referred to as video decoding apparatus or video decoding equipment. The source apparatus 12 and the destination apparatus 14 may be examples of video encoding apparatus or video encoding equipment.
[0045] [0045] The source device 12 and the target device 14 may comprise a wide range of devices, including desktop computers, mobile computing devices, notebook computers (eg laptop) tablet computers, set-top-box converters , telephone devices, such as so-called “smart phones”, televisions, cameras, display devices, digital media players, video game consoles, computers in cars, or the like.
[0046] [0046] The destination apparatus 14 may receive the encoded video data from the source apparatus 12 via a channel 16. The channel 16 may comprise one or more means or apparatus capable of moving the encoded video data from the source apparatus 12 to the destination apparatus 14. In one example, the channel 16 may comprise one or more communication means that allow the source apparatus 12 to transmit encoded video data directly to the destination apparatus 14 in real time. In this example, the source apparatus 12 can modulate the encoded video data in accordance with a communication standard, such as a wireless communication protocol, and can transmit the modulated video data to the destination apparatus 14. The medium or communication media can include can include wireless or wired communication media, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium or media may be part of a packet-based network, such as a local area network, a wide area network, or a global network such as the Internet. Channel 16 may include various types of apparatus, such as routers, switches, base stations or other equipment that facilitate communication from source apparatus 12 to destination apparatus 14.
[0047] [0047] In another example, the channel 16 may include a storage medium that stores the encoded video data generated by the source apparatus 12. In this example, the destination apparatus 14 may access the storage medium by means of access to disk or card access. The storage medium may include various locally distributed or accessed data storage media such as Blu-ray discs, DVDs, CD-ROMs, flash memory or other digital storage media for storing encoded video data.
[0048] [0048] In another example, channel 16 may include a file server or other intermediate storage apparatus that stores encoded video data generated by the source apparatus 12. In this example, the destination apparatus 14 may access stored encoded video data. the file server or other intermediary storage device via streaming or downloading. The file server can be a type of server capable of storing encoded video data and transmitting the encoded video data to the target device 14. Exemplary file servers include web servers (for a website, DOI example), file transfer protocol (FTP) servers, network attached storage devices (NAS), and local disk drives.
[0949] [0949] The destination device can access the encoded video data such as an Internet connection. Exemplary types of data connections can include wireless channels (a Wi-Fi connection, for example), wired connections (such as DSL, cable modem, etc.), or a combination of both that are suitable for accessing data encoded video files stored on a file server. Streaming encoded video data from the file server can be a streaming stream, a download stream, or a combination of both.
[0050] [0050] The techniques of this disclosure are not limited to wireless applications or configurations. The techniques can be applied to video encoding in support of various multimedia applications such as over-the-air television broadcasts, cable television broadcasts, satellite television broadcasts, streaming video broadcasts, eg through from the Internet, encoding video data for storage on a data storage medium, decoding of video data stored on a data storage medium, or other applications. In some examples, the video coding system 10 can be configured to support unidirectional or bidirectional video transmission to support applications such as video streaming, video repeating, playing video broadcasts, and/or video telephony.
[0051] [0051] In the example of Figure 1, the source apparatus 12 includes a video source 18, a video encoder 20 and an output interface 22. In some examples, the output interface 22 may include a modulator/demodulator (modem ) and/or a transmitter. The video source 18 may include a video capture apparatus such as a video camera, a video file containing previously captured video data, a video feed interface for receiving video from a content provider. and/or a computer graphics system for generating video data or a combination of such video data sources.
[0052] [0052] The video video encoder 20 can encode video data from the video source 18. In some examples, the source apparatus 12 directly transmits the encoded video data to the destination apparatus 14 via the output interface 22 In other examples, the encoded video data may also be stored on a storage medium or on a file server for later access by the target device 14 for decoding and/or replay.
[0053] [0053] In the example of Figure 11, the destination apparatus 14 includes an input interface 28, a video decoder 30 and a display apparatus 32. In some examples, the input interface 28 includes a receiver and/or a modem . Input interface 28 can receive encoded video data over channel 16. Display apparatus 32 may be integrated with or may be external to target apparatus 14. In general, display apparatus 32 displays decoded video data. The display apparatus 32 may correspond to "various display apparatus, such as a liquid crystal display (LCD), a plasma display or an organic light emitting diode (OLED) display) or other type of display apparatus. Accordingly with this disclosure, video encoder 20 and video decoder 30 can perform one or more techniques described herein as part of a video encoding process (video encoding or video decoding, for example).
[0054] [0054] Figure 1 is merely an example and the techniques of this disclosure may apply to video encoding settings (video encoding or video decoding, for example) and do not necessarily include any data communication between the encoding apparatus and the video decoding device. In other examples, data is retrieved from local memory, continuously streamed through a network, or the like. A video encoding apparatus can encode and store data in memory, and/or a video decode apparatus can retrieve and decode data from memory. In many instances, video encoding and video decoding are performed by devices that do not communicate with each other, but simply encode data from memory and/or retrieve and decode data from memory.
[0055] [0055] The video encoder 20 and the video decoder 30 can each be implemented as any of a number of suitable circuits, such as one or more processors, digital signal processors (DSPs), application-specific integrated circuits ( ASICS), field programmable gate arrays (FPGAS), discrete logic, hardware, or any combinations thereof. If the techniques are implemented partially in software, an apparatus may store instructions for the software on a suitable non-transient computer readable storage medium and may execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Any of the foregoing elements (including hardware, software, a combination of software and hardware, etc.) can be considered to be one or more processors. Each of the video encoder 20 and the video decoder 30 can be included in one or more encoders or decoders, both of which can be integrated as part of a combined encoder/decoder (CODEC) in respective apparatus.
[0056] [0056] This disclosure may generally refer to the "signaling" by the video encoder 20 of certain information. The term "signaling" may refer broadly to communicating syntax elements and/or other data used to decode the compressed video data. Such communication can take place in real or near real time. Alternatively, such communication can occur within a duration of time, as can occur when storing syntax elements on a computer readable storage medium in a bit stream encoded at the time of encoding, which a decoding apparatus does. video can then retrieve at any time after being stored in this meic. In some examples, from an encoder's perspective, signaling may include generating an encoded bitstream, and, from a decoder's perspective, signaling may include receiving and parsing an encoded bitstream.
[0057] [0057] In some examples, video encoder 20 and video decoder 30 operate in accordance with a video compression standard such as ISO/IEC MPEG4 Visual and ITU-T H.264 (also known as ISO -IEC MPEG-4 AVC) which includes its Scalable Video Encoding (SVC) and Multi-View Video Encoding (MVC) extensions. MVC's latest joint draft is described in ITU-T recommendation HE.264, “Advanced Video Coding for Generic Audio Visual Services” March
[0058] [0058] In other examples, video encoder 20 and video decoder 30 may operate in accordance with other video compression standards, which include the High Efficiency Video Encoding (KFEVC) standard developed by the Collaboration Team Video Coding Joint (JCT-VC) of the Video Coding Expert Group (VCEG), ITU-T, and the Moving Pictures Expert Group (MPEG), ISO/IEC. A draft of the HEVC standard referred to as “Operational Draft HEVC9”, is described in Dross et alii, “High-Efficiency Video Encoding Text Specification (HEVC) 9 Draft”, ITU Video Encoding Joint Collaboration Team- TH. SGl6 WP3 and ISO/IEC JCT1/SC29/WG11º. Meeting: Shanghai, China, October, 20912, can be downloaded from htpp://phenix.int- evry.fr/jct/doc end user/documents/11 Shanghai/wgll/JCTVC- K1003-v8.zip. Another recent draft of the HEVC standard, referred to as “HEVC Operational Draft 10” or “WDL0” is described in JCTVC-LI003v34, Bross et alii, “High Efficiency Video Coding (HEVC) text specification draft 10 (to FDIS & Last Call), “Video Coding Joint Collaboration Team (JCT-VC) of ITU-T SGl6 WP3 and ISO/IEC JITCI/SC29/WHG11, 12th. Meeting Geneva, CH, January 14-23, 21013, downloadable from htpp://phenix.inc= evry.fr/jet/doc end user/documents/12 Geneva/wgll/JCTVC-K1003-v34.zip . Yet another draft of the HEVC standard is referred to herein as “Revisions WD1I0”, described in Bross et alii, “Editors' Proposed Corrections in HEVCLI Version”, Video Coding Joint Collaboration Team
[0059] [0059] Currently a VCEG and MPEG 3D Video Coding Joint Collaboration Team (JCT-3C) is developing a 3DV standard based on HEVC for which part of the standardization efforts includes the standardization of multiview video CODEC based on HEVC (MV-HEVC) and another part for 3D Video Coding based on HEVC (3D-HEVC) for 3D-HEVC, new coding tools can be included and supported, including at the coding unit/unit level. prediction for both texture and depth views. Software for 3D-HEVC (ie 3D-HETM) can be downloaded from the link [3D-HTM version 8.0): https://nevc.nhi.£fraunhofer.de/svn/svn 3DVCSoftware/tags/HT M- 8.0/ an operational draft of 3D-HEVC (ie Techi, “3D-HEVC Draft Text”“, Joint Collaboration Team on ITU-T SG 16 WP3 and ISO/IEC JTC1I 3D Video Encoding Extension Development -SC 29/WG1l1, 5th Meeting, Vienna, AT, July 27, August 2, 21013, document number JCT3V-E1001-v2 (hereinafter, “JCTev-El001” or “Draft Text 3D-HEVCI ”) is available from: htpp://phenix.it.sud-paris.eu/jct2/doc end user/documents/5 Vienna/wgll/JCT3V- E10601-v3.zip A software description of 3D-HEVC ( Zangh, et alii, “3D-HEVC 3 Test Model“, Joint Collaboration Team on ITU-T Video Coding Extensions Development ITU-T SGlI6 WP 3 and ISO/IEC JTC 1/SC 29/WG 11 3rd Meeting, Geneva, CH, January 12-23, 2013, document number: JCT3V-C1005 dO (hereinafter “JCT3V-C1005”
[0060] [0060] As mentioned briefly above, Video encoder 20 encodes video data. Video data can match one or more images. Each of the images is a still image that is part of a video. When the video encoder 20 encodes the video data, the video encoder 20 can generate a bit stream. The bitstream can include a sequence of bits that form an encoded representation of the video data. The bit stream can include encoded images and related data. An encoded image is an encoded representation of an image, The related data may include Sequence Parameter Sets (SPSs), Image Parameter Sets (PPSs), Video Parameter Sets (VPSs) Adaptive Parameter Sets (APSS), slice headers, block headers and other syntax structures.
[0061] [0061] An image can include three arrays of samples, denoted as S:, Sc, E Ser S&S, is a two-dimensional array (ie, a block) of luma samples. Luma samples may also be referred to herein as “vv” samples, If It is a two-dimensional array of Cb chrominance samples. Sm is a two-dimensional array of Cr chrominance samples. Chrominance samples may also be referred to herein as "chroma" samples. Cb chrominance samples may be referred to herein as "U samples". Cr chrominance samples may be referred to here as “V samples”.
[0062] [0062] In some examples, the video encoder 20 may sub-sampler the chroma arrangements of an image (ie, So and Sc). For example, video encoder 20 can use a YUV 4:2:0 video format, a YUV 4:2:2 video format, or a 4:4:4 video format. In YUV 4:2:0 video format, video encoder 20 can sub-sampler the chroma arrays so that the chroma arrays are half the height and half the width of the chroma array. In YUV 4:2:2 video format, the video encoder 20 can sub-sampler the chroma arrays so that the chroma arrays are half the width and the same height as the luma array. In YUV 4:4 video format :4, the video encoder 20 does not undersamp the chroma arrays.
[0063] [0063] To generate an encoded representation of an image, the video encoder 20 can generate a set of encoding tree units (CTUs). Each of the CTUS may be a code tree block of luma samples, two corresponding code tree blocks of chroma samples, and syntax structures used to encode the samples of the code tree blocks. In a monochrome image or an image that has three separate color planes, a CTU may comprise a single code tree block and syntax structures used to encode the code tree block samples. A code tree block (CTB) can be an NxN block of samples. A CTU may also be referred to as a “tree block” or “largest encoding unit” (LCU). HEVC CTUs can be broadly analogous to other standard macroblocks such as H.264/AVC. However, a CTU is not necessarily limited to a specific size and may include one or more encoding units (CUs).
[0064] [0064] As part of encoding an image, the video encoder 20 can generate encoded representations of each image slice (i.e. encoded slices). To generate an encoded slice, the video encoder 20 can encode a series of CTUs. This disclosure may refer to an encoded representation of a CTU and encoded CTU. In some examples, each of the slices includes an integer number of encoded CTUS.
[0065] [0065] To generate a coded CcTU, the video encoder 20 can recursively perform qguad-tree transformation partitioning on the code tree blocks of a CTU so as to divide the code tree blocks into code blocks, hence the name “coding tree units”. A coding block is an NxN block of samples. A CU can be one coding block of luma samples and two corresponding coding blocks of chroma samples of an image that have an array of luma samples, an array of Cb samples and an array of Cr samples, and syntax structures used to encode the samples of the coding blocks. In a monochrome image or an image that has three separate color planes, a cu may comprise a single coding block and syntax structures used to encode the coding block samples.
[0066] [0066] the video encoder 20 can partition a coding block of a CU into one or more prediction blocks. A prediction block can be a rectangular (ie, square or non-square) block of samples to which the same prediction is applied. A prediction unit (PU) of a CU can be a luma-sample prediction block, two corresponding prediction blocks of chroma-samples of an image, and syntax structures used to predict the prediction block samples. A monochrome image or an image that has three separate color planes, a PU can comprise a single prediction block and syntax structures used to predict the prediction block samples. Video encoder 20 can generate a predictive block for each prediction block of a PU. For example, the video encoder 20 can generate predictive luma, Cb and Cr blocks for luma, Cb and Cr predictive blocks of each CU PU. Predictive blocks can also be referred to as predictive sample blocks.
[0067] [0067] The video encoder 20 can use intra-prediction or inter-prediction to generate co-predictive blocks for a PU. If the video encoder 20 uses intra-prediction to generate the predictive blocks of a PU, the video encoder 20 can generate the predictive blocks of the PU based on decoded samples of the image associated with the PU.
[0068] [0068] If the video encoder 20 uses inter-prediction to generate the predictive blocks of a PU, the video encoder 20 can generate the predictive blocks of the PU based on decoded samples of one or more images other than the associated image to the PU, the video encoder 20 can use uni-prediction or bi-prediction to generate the predictive blocks of a PU. When the video encoder 20 uses a prediction to generate the predictive blocks for a PU, the PU can have a single motion vector. When video encoder 20 uses uni-prediction to generate the predictive blocks for a PU, the PU can have two motion vectors.
[0069] [0069] After the video encoder 20 generates prediction blocks (predictive luma, Cb and Cr blocks, for example) for one or more PUs of a CU, the video encoder 20 can generate one or more residual blocks for the CU . Each sample in the residual block for the CU can indicate the difference between a sample in a predictive block of a PU of the CU and a corresponding sample in the coding block of the CU. For example, the video encoder 20 can generate a residual luma block for the CU. Each sample in a residual luma block of the CU can indicate the difference between a luma sample in a predictive luma block of the CU PU and a corresponding sample in the original Cb coding block of the CU. Furthermore, the video encoder 20 can generate a residual block Cb for the CU. Each sample in a residual Cb block of a CU can indicate the difference between a sample Cb in one of a predictive Cb block of a CU PU and a corresponding sample in an original Cb coding block of the CU. The video encoder 20 can also generate a residual block Cr for the CU. Each sample in a CU residual Cr block can indicate the difference between a Cr sample in a CU PU predictive Cr block and a corresponding sample in an original CU Cr coding block.
[0072] [0072] The video encoder 20 can apply one or more transforms to a block of transforming a TU in order to generate a block of coefficients for the TU. A coefficient block can be a two-dimensional array of transform coefficients. A transform coefficient can be a scalar quantity. For example, video encoder 20 may apply one or more transforms to a luma transform block of a TU in order to generate a block of luma coefficients for the TU. Video encoder 20 can apply one or more transforms to a Cb transform block of a TU in order to generate a block of Cb coefficients for the TU. Video encoder 20 can apply one or more transforms to a Cr transform block of a TU in order to generate a block of Cr coefficients for the TU.
[0073] After generating a coefficient block (a block of luma coefficients, a block of Cb coefficients or a block of Cr coefficients, for example), the video encoder 20 can quantize the block of coefficients. Quantification generally refers to a process in which transform coefficients are quantified to possibly reduce the amount of data used to represent the transform coefficients, obtaining additional compaction. After the video encoder 20 quantizes a block of coefficients, the video encoder 20 can entropy encode syntax elements that indicate the quantized transform coefficients. For example, the video encoder 20 can perform Context Adaptive Binary Arithmetic Coding (CABAC) on the syntax elements that indicate the quantized transform coefficients. The video encoder 20 can transmit the entropy encoded syntax elements in a bit stream. The bit stream can also include syntax elements that are not entropy encoded.
[0074] [0074] The video decoder 30 can receive a bit stream generated by the video encoder
[0075] [0075] In some cases, the video encoder can signal the motion information of a PU using the merge mode or the hop mode, or possibly an advanced motion vector prediction (AMVE) mode. In other words, in the HEVC standard there are two inter-prediction modes for the PU, namely the fusion mode (the hop is considered as a special case of fusion and the AMVP mode, respectively) or in the AMVE mode, a fusion encoder. video maintains a list of motion vector candidates for various motion vector predictors. For ease of explanation, this disclosure may refer to . to a list of motion vector candidates for the fusion mode i as "fusion candidate list". Likewise, this disclosure can refer to a motion vector candidate list for the AMVP mode as an AMVP candidate list. The motion information of a PU can include the PU motion vector or vectors and/or the PU reference index or indices.
[0076] [0076] When video encoder 20 signals current PU motion information using merge mode, video encoder 20 generates a list of merge candidates. The list of candidates for the merger includes a set of candidates. Candidates on a list of merger candidates may be referred to as “merger candidates”. Candidates can indicate movement information from PUs that are spatially or temporally neighbors to the current PU. PUs that are spatially neighbors to the current PU can have predictive blocks adjacent to a current PU predictive block in the same current PU image. PUs that are temporally neighbors to the current PU may be in a different image than the current PU. The video encoder can then select a candidate from the candidate list and can use the motion information indicated by the selected candidate as the motion information of the current PU. Furthermore, in merge mode, Video encoder 20 can flag the position in the candidate list of the selected candidate. For example, Video encoder 20 may flag a merge index (merge idx, for example) that indicates the position in the merge candidate list of the selected merge candidate. The video decoder 30 can generate the same list of candidates and can determine, based on the indication of the position of the selected candidate (the position indicated by the fusion index, for example), the selected candidate. THE . Video decoder 30 can then use the selected candidate's motion information to generate one or more predictive blocks (predictive samples, for example) for the current PU. The video decoder 30 can reconstruct samples based on predictive blocks (predictive samples, for example) for the current PU and a residual signal. In this way, a video encoder can generate a motion vector or vectors, as well as reference indices in fusion mode, of the current PU by taking a candidate from the motion vector candidate list.
[0077] [0077] Hop mode is similar to merge mode in that the video encoder 20 generates a candidate list and selects a candidate from the candidate list. However, when the video encoder 20 signals the motion information of the current PU (a block depth, for example) using the skip mode, the video encoder 20 can avoid generating any residual signal. Since the hop mode has the same motion vector derivation process as the fusion mode, the techniques described in this document can be applied to both the fusion mode and the salted mode. One or more aspects of this disclosure can be used in the AMVP mode or in other modes that use candidate lists.
[0078] [0078] AMVP mode is similar to merge mode in that the video encoder 20 generates a candidate list and selects a candidate from the candidate list. However, when the video encoder signals the motion information of the current PU (a block depth, for example) using AMVP mode, the video encoder 20 may signal a motion vector difference (MVD) for the current PU and a reference index in addition to the candidate's position flag - selected from the candidate list. An MVD for the current PU. can indicate the difference between a motion vector of the PU r and the motion vector of the selected motion vector candidate. In uni-prediction the video encoder 20 can signal an MVD and a reference index for the current PU. In bi-prediction, the video encoder 20 can signal two MVDs and two reference indices for the current PU. In some examples, the video encoder 20 may typically signal an MVD and a reference index for the current PU, although block depth prediction may also use techniques similar to bi-prediction, in which two MVDs and two indices are signaled. reference.
[0079] [0079] In addition, when the current PU motion information is signaled using the AMVP mode, the video decoder 30 can generate the same list of candidates and can determine, based on the indication of the position of the selected candidate, the selected candidate. The video decoder 30 can retrieve the motion vector of the current PU by adding an MVD to the motion vector of the selected candidate. The video decoder 30 can then use the motion vector or motion vectors retrieved from the current PU to generate predictive blocks for the current PU.
[0080] [0080] In some examples, the motion vector candidate list contains up to five candidates for fusion mode and only two candidates for c AMVP mode. In other words, a merger candidate list can include up to five candidates, while an AMVP candidate list can only include two candidates. A merge candidate (that is, a candidate in a list of motion vector candidates for merge mode) can contain motion vectors that correspond to both reference index lists (list O and list 1) and reference indexes. reference. If a fusion candidate is identified by a fusion index, the reference images used for the prediction of the current blocks as well as the related motion vectors are determined. However, under an AMVP mode for each potential prediction direction of either list 0 or list 1 a reference index is explicitly flagged, along with a motion vector predictor index of the motion vector candidate list once the AMVP candidate contains only one motion vector. In AMVP mode, predicted motion vectors can also be refined.
[0081] [0081] As indicated above, a video encoder can derive candidates for the merge mode of spatial and temporal neighbor blocks. The video encoder can derive the maximum number of candidates from the encoded five minus max syntax element in a merge candy, which is embedded in a slice header for a slice. The five minus max syntax element in a merge cand specifies the maximum number of merge candidates supported in the slice, subtracted from 5. The video encoder can derive the maximum number of merge candidates MaxNumMergeCand, as follows: MaxNumMergeCand = 5 - five minus max num merge cand (7-39) The value of MaxNumMergeCand is in the range 1 to 5, inclusive.
[0082] [0082] A video encoder can build the list of merge candidates with the following steps. First, the video encoder can derive up to 4 spatial motion vector candidates from 5 spatial neighbor blocks shown in Figure l. Figure 2 is a . conceptual illustration showing spatial neighbors that are potential candidates for the merge list. The ' arrows indicate which spatial candidate(s) will be compared. The video encoder can derive the spatial motion vector candidates in the following order: left (Al), up (Bl), up right (BO), down left (AO), and up left (B2) as shown in Figure 2. In addition, the video encoder can apply a pruning process to remove identical spatial motion vector candidates. For example, The video encoder can compare B1 with Al, compare BO with Bl, compare AO with Al, and compare B2 with both Bl and Al, if there are already four fusion candidates available after the pruning process, the video encoder does not inserts B2 into the list of merger candidates.
[0083] [0083] Second, the video encoder can determine candidates for temporal fusion. For example, the video encoder can add a temporal motion vector predictor (TMVP) candidate of a co-located reference image (if enabled and available) to the merge candidate list (ie, the candidate list for motion vector) after vector candidates (that is, the list of motion vector candidates) is not complete, spatial motion vector candidates.
[0084] [0084] Thirdly, if the fusion candidate list, the video encoder can generate and insert artificial motion vector candidates at the end of the fusion candidate list until the fusion candidate list has all candidates ( that is, all candidates nominated by MaxNumMergeCand). In other words, the video encoder can insert artificial motion vector candidates into the 1 merge candidate list if the number of merge candidates in the merge candidate list is less than MaxNumMergeCand. artificial motion: combined bi-predictive fusion candidates (which are derived only for B slices) and zero motion vector fusion candidates. The list of fusion candidates can include one or more zero motion vector fusion candidates if the first type (ie, combined bi-predictive fusion candidates) does not have enough artificial candidates.
[0085] [0085] When the current slice (that is, the slice that a video encoder is encoding all the time) is a B slice, the video encoder can call a derivation process for candidates for combined bi-predictive fusion. In at least some examples, a B-slice is a slice in which intra-prediction, unidirectional inter-prediction, and bidirectional inter-prediction are allowed. When the derivation process is called, the video encoder can, for each pair of fusion candidates that are already in the fusion candidate list and have the necessary motion information, derive combined bi-predictive motion vector candidates (with index denoted by combldx) by a combination of the motion vector (and, in some cases, the reference index) of the first merge candidate of the pair (with merge candidate index equal to 10CandIdx) referring to an image in list 0 ( if available) and the motion vector (and, case elements, of the reference index) of a second merge candidate of the pair (with merge candidate index equal to llCandIdx) referring to an image in list 1 (if available and or the reference image or the motion vector is different from the first candidate). The merger candidate pair can be an ordered pair in the sense that orders different from . same two merger candidates are considered different pairs. The definitions of 10CandIdx and 11CandTdx that Í comprise combldx are shown in Table 1 below: Table 1: Specifying 10CandIdx and 11CandTdx | shadows | 6 [1 [2 [3/4 (50/67/68 9 /m/m)
[0086] [0086] In Table 1, the row for l0CandIdx indicates fusion candidate indices from which to draw RefPicListO motion information (motion vectors, reference indices, for example). Likewise, in Table 1, the row for llCandIdx indicates indices of merge candidates from which to pull RefPicListl motion information. Thus, the column for the combination O (that is, combIdx = 0) indicates that a motion vector and bi-predictive candidate combined specifies the RefPicListO motion information of the merge candidate O and specifies the RefPicListl motion information of the candidate to merge 1. Since not all merge candidates necessarily have the applicable move information for a combination (merge candidate 1 may not have the RefPicListl move information, for example) or the RefPicListO move information associated with the candidate to the merge The and RefPicListl associated with the merge candidate are identical, a video encoder can process the combinations from Table 1 in the order of combIdx until there are no remaining combinations available or the video encoder has generated a sufficient number of vector candidates of combined bi-predictive motion.
[0087] [0087] For combIdx being O ... 11, The combined bi-predictive motion vector candidate generation process is terminated when one of the following conditions 7 is true: " combIdx is equal to (numOrigMergeCand * " (numorigMergeCand - 1 )), where numOrigMergeCand denotes the number of candidates in the merge list before this process was called. =" The number of total candidates, including merged bi-predictive merge candidates is newly generated) in the merge list equals MaxNumMergeCand.
[0088] [0088] As noted above, a video encoder can include one or more zero motion vector fusion candidates into one fusion candidate. For each respective zero motion vector merge candidate, the motion vector of the respective zero motion vector merge candidate is set to 0 and the reference index for the respective zero motion vector merge candidate is set to zero up to the number of available reference indices minus 1. If the number of fusion candidates in the fusion candidate list is even less than MaxNumMergeCand, the video encoder can enter one or more zero motion vector candidates (zero reference indices and motion vectors, for example) until the total number of merge candidates in the merge candidate list equals MaxNumMergeCand.
[10089] [10089] The following subsections of this review examine the AVC-based and HEVC-based 3D video encoding techniques related to this disclosure. When encoding multiple views (3D video encoding) there can be multiple views in the same scene from different viewpoints). The term “access unit” can be used to refer to the set of images that correspond to the same occurrence of time. In other words, an access unit can include encoded images of all views for an exit-time occurrence. A * "view component" can be a coded representation of a view in a single access unit. In some examples, a view component may contain a texture view component and a depth view component. In this disclosure, a "view" may refer to a sequence of view components associated with the same view identifier. Thus, when a view includes both textured and depth-coded representations, a view component may comprise (consist of, for example) a texture view component and a depth view component. In some examples, a texture view component is an encoded representation of the texture of a view in a single access unit. Also, in some examples, a view-depth component is an encoded representation of the depth of a view in a single access unit. A depth view component may also be referred to as a depth image.
[0030] [0030] Each texture view component includes an actual image content to be displayed. For example, a texture view component can include luma (Y) and chroma (Cd and Cr) components. Each Depth View Component can indicate the relative depths of pixels in its corresponding Texture View Component. In some examples, the depth view components are grayscale images that only include luma values. In other words, the Depth View Components may not convey any image content, but instead may provide measures of the relative depths of pixels in corresponding Texture View Component Components, £0OS1] For example, a pixel purely white. in a depth view component can indicate that the corresponding pixel or pixels of the pixel in the corresponding texture view component are closer, from the perspective of the viewer. In this example, a pure black pixel in the Depth View component indicates that the corresponding pixel or pixels of the pixel in the corresponding Texture View component are further away from the Observer's perspective. The different shades of gray between black and white indicate different levels of depth. For example, a dark gray pixel in a Depth View component indicates that the corresponding pixel pixel in the Texture View component is further away than a light gray pixel in the Depth View component. In this example, since only gray scale is needed to identify the pixel depth, the depth view components need not include chroma components. Since the chroma components to the depth view components may serve no purpose. This disclosure presents the example of depth view components that use only luma values (intensity values, for example) to identify depth for exemplary purposes, and should not be considered limiting. In other examples, other techniques can be used to indicate relative pixel depths in texture view components.
[0092] [0092] When encoding multiple views, a bit stream can have a series of layers. Each of the layers can comprise a different view. In multi-view encoding, a view may be referred to as a "base view" if a video decoder (the video decoder 30, for example) can decode pictures in the view without reference to pictures in any other view. A view can be referred to as a non-base view if view decoding depends on decoding images in one or more other views. When encoding an image in one of the non-base views, a video encoder (such as video encoder 20 or video decoder 30, for example) can add an image to a list of reference images if the image is in a different view, but within the same occurrence of time (ie access unit of the image the video encoder is currently encoding). Like other inter-prediction reference pictures, the video encoder can insert an inter-view prediction reference picture at any position in a list of reference pictures.
[0093] [0093] In 3D-HEVC, a disparity vector (DV) can be used as an estimator of the displacement between two views. Since neighboring blocks share almost the same motion/disparity information in video encoding, the current block can use motion vector information in neighboring blocks as a good predictor. Following this idea, the disparity vector derivation process based on neighbor blocks (NBDV) uses the neighbor motion vector information to estimate the disparity vector in different views. 3D-HEVC first adopted the Disparity Vector (based on) Neighbor Blocks (NBDV) method proposed in the following document: Zangh et alii, “3D-CES.h: Disparity Vector Generation Results” Joint Collaboration Team at Extension Development of
[00394] [00394] Several spatial and temporal neighbor blocks are defined in the NBDV process. A video encoder performing the NBDV process checks each of the spatial and temporal neighbor blocks in a pre-defined order determined by the priority of the correlation between the current block and the candidate block (special or temporal neighbor block). Thus, in the NBDV process, the video encoder uses two sets of neighboring blocks. One set of neighboring blocks is spatial neighbor blocks and the other set is temporal neighbor blocks. When the video encoder checks a neighboring block, the video encoder can determine whether the neighboring block has a disparity motion vector (i.e., indicates an inter-view reference picture). Once the video encoder finds a disparity motion vector, the video encoder can convert the disparity motion vector into a disparity vector. For example, to convert the disparity motion vector to the disparity vector, the video encoder can set the disparity vector as equal to the disparity motion vector. Meanwhile, the order index of related reference views is also sent back. In other words, as part of executing the NBDV process, the video encoder can also determine an order index of reference views.
[0085] [0085] In some versions of 3D-HEVC, the video encoder uses two spatial neighbor blocks in the NBDV process for the derivation of disparity vectors. The two spatial neighbor blocks are at
[0036] [0036] In some examples, to check temporal neighbor blocks in the NBDV process, the video encoder can first perform a construction process to generate a list of candidate pictures. Even two reference images of the current view (that is, the view that includes the image that is currently being encoded) can be treated as candidate images. A co-located reference image (ie, a co-located image) is first inserted into the candidate image list, followed by the rest of the candidate images (ie, all reference images in RefPicListO and RefPicListl) in ascending order of reference indices.
[0097] [0097] If the current slice of the current image is a B-slice (that is, a slice that is allowed to intervene bidirectionally interpredicted PUs) the video encoder 20 can signal, in a slice header, a syntax element ( collocated from 10 flag, for example) which indicates whether the co-located image is from RefPicListO or RefPicListi. In other words, when using TMVPs is enabled for the current slice and the current slice is a B slice (a slice that is allowed to include bidirectionally-predicted PUs) the video encoder can flag a syntax element (collocated from 10 flag, for example) in a slice header to indicate whether à
[90398] [90398] When two reference pictures with the same reference index in both reference picture lists are available, the reference picture in the same reference picture list of the co-located picture precedes the other reference picture. For each candidate picture in the reference picture list, the video encoder can determine the block of the colocalized region that covers the central position as the temporal neighbor block.
[0099] [0099] When a block is encoded with inter-view motion prediction the video encoder may need to derive a disparity vector to select a corresponding block in a different view. An implicit disparity vector (IDV or otherwise derived disparity vector) can be referred to as a disparity vector.
[0100] [0100] In at least some 3D- ' HEVC designs, the NBDV process checks disparity motion vectors in the temporal neighbor blocks, disparity motion vectors in the spatial neighbor blocks, and then the IDVs in order. Once the video encoder finds a disparity motion vector or IDV, the video encoder finishes the NBDV process.
[0101] [0101] In some examples, when a video encoder derives a disparity vector from the NBDV process, the video encoder also refines the disparity vector by retrieving depth data from a depth map (ie, a depth view component of the reference view). The refinement process is called depth-oriented NBDV (DONBDV) and can include the following two steps. First, find a block depth corresponding to the derived disparity vector in the previously coded reference depth view, such as in the base view; the corresponding depth block size is identical to the current PU. Second, select a depth value of four corner pixels from the corresponding depth block (due to the adoption of Chang et alii, “3D-CE2.h related: Simplified DV derivation for DONBDV and BVSP,” Joint Collaboration Team in Development of ITU-T 3D Video Coding Extensions SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 4th, Meeting: Incheon, KR, April 20-26, 2013, document JCT3V-DO138 (here by day “JCY3V-DO138”)) and convert the selected depth value in the
[0103] [0103] For merge/jump mode, oThe video encoder can derive an inter-view predicted motion vector by the following steps. Firstly, the video encoder can locate a corresponding block of current PU/CU in a reference view of the same access unit by the disparity vector. Second, if the corresponding block is not intra-coded and not predicted inter-view and its reference picture has a total picture orders (POC) value equal to an entry in the same PU reference picture list /CU current, the video encoder can derive its motion information (prediction direction, reference images and motion vectors) after
[0104] [0104] Figure 4 shows an example of the inter-view predicted motion vector candidate derivation process. In particular, Figure 4 is a ' conceptual illustration showing the derivation of an inter-view predicted motion vector candidate for the O ' merge/jump mode. In the example in Figure 4, the current PU 40 occurs in view Vl on an occurrence of Tl. A reference PU 42 to the current PU 40 occurs in a different view than the current PU (ie, the VO0 view) and in the same time instance as the current PU 40 (ie, the T1 time instance). In the example in Figure 4, the reference PU 42 is bidirectionally interpredicted. Consequently, the reference PU 42 has a first motion vector 44 and a second motion vector 46. The motion vector 44 indicates a position in a reference image 48. The reference image 48 occurs in the DV view and at the occurrence of time TO. Motion vector 46 indicates a position in reference image 50, reference image runs in view VO at time occurrence T3.
[0105] [0105] The video encoder can generate, based on the motion information of the reference PU 42, an IPMVC for inclusion to the current PU 40 fusion candidate list. The IPMVC may have a first motion vector 52 and a second motion vector motion vector 54. Motion vector 52 corresponds to motion vector 44 is Motion vector 54 corresponds to motion vector 46. The video encoder. The video encoder generates the TPMVC such that a first reference index of the IPMVC indicates a position in RefPicListO in the current PU 40 of a reference image (ie, reference index 56) that occurs at the same time occurrence of the image in
[0106] [0106] Thus, in the example of Figure 4, a disparity vector is calculated by finding the corresponding block 42 in a different view, (view O or VOL, for example) from the current PU 40 of the currently coded view (view 1 or VII). If the corresponding block 42 is not intra-coded and not predicted inter-view, and Its reference image has a POC value that is in the current PU's reference image list 40 (such as RefC, List 0; RefO , List 1; Refl, List 1, as shown in Figure 4), then the motion information for block crt 42 is used as an inter-view predicted motion vector. The video encoder can scale the reference index based on the POC.
[0107] [0107] Also, when generating a merge candidate list (or in some examples, an MVP candidate list) for a block (PU, for example), the video encoder can convert the block's disparity vector to a candidate inter-view disparity motion vector (IDM). IDMVC can specify the block disparity vector. The video encoder can add TDMVC to the merge candidate list (or in some examples, AMVP candidate list) in a position. different from IPMVC. Alternatively, in some examples, The video encoder can add the IDMVC to the list. merger candidates (or in some examples, list of AMVP candidates) in the same position as IPMVC when IDMVC is available. In this context, either a TPMVC or an ITDMVC can be called an “interview candidate”. In some examples, in merge/jump mode the video encoder always inserts the IPMVC, if available, before all spatial and temporal merge candidates in the merge candidate list. In some of such examples, the video encoder can insert the IOMVC before the Ar-derived special fusion candidate.
[0108] [0108] Thirumalai, at alii, “Derivation of fusion candidates from vector displacement”, Joint Collaboration Team on ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29 3D Video Encoding Extensions /WG 11, 5th. Meeting: Vienna, AU, 27th July - 2nd August 2013, document no. JCT3V-E01I26 (hereinafter “JCT3V-E0126”)) describes derivation of fusion candidates from vector displacements. JCT3V-E0126 is available from http://phenix.it- sudparis.eu/jctôvdoc end user/current document .php id=1140. Due to the adoption of JCT3V-E0126, one more candidate, called a “shifted candidate” or “shifted IvMVC” can be derived with a shifted disparity vector. Such a candidate can be an IPMVC derived from a reference block in a reference view with offset disparity vectors or derived from the first available special merge candidate that includes a disparity vector or IDMVC. Detailed Steps to Generate The Additional Candidate and Insertion in the Merger Candidate List section described as follows.
[0109] [0109] First, a video encoder shifts the DV disparity vector by (PuWidth/2*4+4), (PuHeight/2*4+4)). The video encoder uses DV for . derive a hanging IVMC candidate from the reference view. Here, the current PU size is PuWidth x PuHeight. If the offset IvMC is available, the video encoder can skip step 2 (i.e. the second step described below) and if this offset IvMC is not identical to the IvMC without disparity vector offset, the video encoder enters the IvVMC shifted in list of merger candidates immediately before temporal merger candidates.
[0110] [0110] Second, the video encoder can derive a candidate, denoted as Disparity-Displaced Motion Vector (DSMV). The video encoder can configure DSMV to be the additional candidate. If the DSMV is available, the video encoder can insert the DSMV into the merge candidate list at the same position as a shifted IvMC. The video encoder can derive DSMV as follows. First, the video encoder identifies the first available disparity motion vector (DMV) that corresponds to RefPicListO from the spatial neighbor blocks. Second, if the O DMV is available, the video encoder sets the motion vector horizontal component in list O as DMV offset by four and the video encoder either keeps the motion vector vertical component unchanged or resets the motion vector vertical component. zero motion vector depending on whether or not BVSP is enabled. The reference indices and motion vectors in Listing 1 are inherited directly. Otherwise (that is, if DMV is not available), the video encoder sets the horizontal component of the motion vector in List O and List 1 to DV offset by four and the video encoder sets . both the vertical components of the motion vectors in List 0 and List 1 in O.
[0112] [0112] In some 3D-HTM designs, BVSP mode is only supported for an inter-code block either in jump mode or in merge mode. BVSP mode is not allowed for a block encoded in AMVP mode. Instead of passing an indicator to indicate the use of the BVSP mode, an additional merge candidate (ie, the BVSP merge candidate) is introduced and each candidate is associated with a BVSP indicator. As indicated above, the video encoder 20 can signal a melt index. (merge idex, for example) into a bitstream and the video decoder 30 can get the fusion index of the - bitstream. When the decoded fusion index matches a BVSP fusion candidate, the current PU uses the BVSP mode. Furthermore, when the decoded merge index matches the BVSP merge candidate, for each subblock within the current PU, the video encoder can derive a disparity motion vector from the subblock by converting a depth value to a depth reference view.
[0013] [0013] The configuration of BVSPF indicators can be defined as follows. When a special neighbor block used to derive a special merge candidate is encoded as BVSP mode, the correct motion information is inherited by the current block as in conventional merge mode. Furthermore, this special fusion candidate is tagged with a BVSP indicator equal to l. For the newly introduced BVSP merger candidate, the DvVSP flag is set to 1. For all other merger candidates, the related BVSP flags are set to 0.
[0114] [0114] As noted above, in 3D-HEVC, a video encoder can derive a new candidate (that is, a BVSP merger candidate) and can insert the BVSP merger candidate into the list of merger candidates. The video encoder can fix the corresponding reference indices and motion vectors against the BVSP merge candidate by the following method. First, the video encoder can obtain the view index (denoted by refVIdxLX) of the NBDV-derived disparity vector. Second, the video encoder can get the reference picture list RefPicListX (or RefPicListO or RefPicListl) which is associated with the reference picture with the order index. view equal to refVIdxLX). The video encoder can use the reference index and the disparity vector. correspondent of the NBDV process as the movement information of the BVSP merge candidate in RefPicListX.
[0115] [0115] Thirdly, if the current slice is a B slice, the video encoder can check the availability of an inter-view reference picture, a view order index (denoted by refVIdxLY) not equal to refVidxlX in the other list of reference images other than RefPicListX (ie RefPicListY with Y being 1- XxX). If such a different inter-view reference image is found, the video encoder applies bi-predictive VSP. Meanwhile, the video encoder uses the corresponding reference index of the different intervening reference index and the scaled disparity vector of an NBDV process as the motion information of the DVSP merge candidate in RefPicListY. The video encoder can use the block view depth with view order index equal to refVidxLX as the current block depth information (in case of first order texture coding), and the video encoder can access both different inter-view reference images (each from a list of reference images) through a weighted feedback-deformation and non-transient process in order to obtain the final retroactive VSP predictor. Otherwise, the video encoder applies uni-predictive VSP with RefPicListX as the reference picture list for prediction.
[0116] [0116] In 3D-HTM, texture coding is first applied to common test conditions. Therefore, the corresponding non-base depth view is unavailable when decoding a non-base texture list. Therefore, the depth information is . estimated and used to run BVSP. In order to estimate depth information for a block, a video bi encoder can first derive a disparity vector from neighboring blocks and then use the derived disparity vector to obtain a block depth of a reference view. In the 3D-ETM 8.0 test model, there is a process to derive a disparity vector predictor, known as MBDV (Neighbor Block Disparity Vector). That (dv.,, dv.) denotes the disparity vector identified from the NBDV function and that the position of the current block is (block,, block,).
[01217] [01217] In some uni-predictive BVSF examples, a video encoder looks for a depth block at the top left position (block, + dv., block, + bio.) in the depth image in the reference view. The current block is first divided into several subblocks each having the same size as W“*H. For each subblock with the size W*H the video encoder uses a subblock of corresponding depth within of the fetched block depth and converts the maximum depth value of the four corner pixels of the depth subblock into a disparity motion vector. The video encoder then uses the derived disparity motion vector for each sub-block for motion compensation. Figure 5 shows the three steps of how a reference view depth block is located and then used rare BVSP (also called “BVSP prediction”“”).
[0118] [0118] In particular, Figure 5 is a conceptual diagram that shows the derivation of depth blocks from a reference view to perform BVSP prediction. In some examples of bi-predictive BVSP, when there are multiple cross-view reference images of views - different in RefPicListO and RefPicListl, the video encoder applies bi-predictive VSP. That is, the video encoder can generate two VSP predictors from each reference list as described above. The video encoder can then proportionally divide the two VSP predictors in order to obtain the final VSP predictor.
[0119] [0119] In the example in Figure 5, a video encoder is encoding the current texture image 60. the current texture image 60 is labeled "dependent texture image", since the current texture image 60 depends on a synthesized reference texture image 62. In other words, it may be necessary for the video encoder to synthesize the reference texture image 62 (or parts thereof) so as to encode the current texture image 60. The reference texture image 62 and current texture image 60 are in the same access unit but are in different views.
[0120] [0120] In order to synthesize the reference texture image 62 (or parts thereof), the video encoder can process blocks (i.e. video units) of the current texture image 60. In the example of Figure 5, the encoder The video encoder is processing the current block 64. When the video encoder processes the current block 64, the video encoder can perform the NEBDV derivation process to derive a disparity vector for the current block 64. In the example in Figure 5, by For example, the video encoder identifies a disparity vector 66 of a block 68 that is neighbor of the current block 64. The identification of the disparity vector 66 is shown as Step 1 of Figure 5. Furthermore, in the example of Figure 5, the video encoder determines, based on disparity vector 66, a disparity vector 69 in the current block
[0122] [0122] The motion compensation size (ie W*KH as described above) used in BEBVSP can be either 8x4 or 4x8. To determine the size of the motion compensation, the following rule is applied. For each 8x8 block, the video encoder checks four corners of the 8x8 block of corresponding depth and: if (vaepth(TL]<vdepth/BR] 0:1!=(vdepth[TR]<vdeprhl BL] 0:1 ) uses 4x8 partition (X=4, H=80 . otherwise uses 8x1 partition (W=8, H=4) Figure 6 is a conceptual diagram showing four corner pixels of a block 8x8 in depth.
[0123] [0123] The maximum number of fusion candidates and the process of building fusion lists for 3D-HEVC are described in the following paragraphs. In some versions of 3D-HEVC, the total number of candidates in the merge list is up to six and five minis max in a merge cand is flagged in the slice header to specify the maximum number of merge candidates subtracted from five. five minus max in a merge cand is in the range zero to 5 inclusive. five minus max num merge cand specifies the maximum number of merge motion vector (MVP) predictor candidates (ie, merge candidates) Supported in slice subtracted from 5. A video encoder can compute the maximum number of AMVP candidates (ie MaxNumMergeCand) as: MaxNumMergeCand = 5 — five minus max num merge cand + iv mv pred flag nuh layer id | (H-1) In such versions of 3D-HEVC, the value of five minus max in a merge cand will be limited so that MaxNumMergeCand is in the range o to (5 + iv mv pred flag[ nuh layer id ]), inclusive.
[0124] [0124] Furthermore, in such versions of 3D-HEVC, a syntax element iv mv pred flag([ layerIid ) indicates whether the prediction of inter-view motion parameters is used in the decoding process of the layer with nuh iayer id equal layerdId. iv mv pred flag [ lavyerId |] equal to 0 specifies that the prediction of inter-view motion parameters is not used for the layer with . nuh layer id equal to layerId. When not present, the value of iv mv pred flag [ lavyerld ] will be inferred with - equal to 0.
[0125] [0125] The process of building the list of merger candidates in 3D-HEVC can be defined as follows: l. Insertion of IPMVC: When inter-view motion prediction is applied, the video encoder derives an IPMVC by the procedure described above. If IPMVC is available, the video encoder inserts IPMVC into the merge list (that is, the merge candidate list).
[0126] [0126] The design of the process of derivation of candidates for combined bi-predictive fusion in 3D-HEVC can have one or more potential problems. For example, the current design of the combined bi-predictive fusion candidate derivation process in 3D-HEVC may require additional logical units to be added to verify the existing 1 and second fusion candidate BVSP indicators used to build a fusion candidate. combined bi-predictive fusion. However, additional checking of the BVSP indicators does not help in terms of coding effectiveness. Thus, the additional verification of the BVSP indicators increases the complexity.
[0127] [0127] In another example of the potential problems associated with the process of derivation of candidates for combined bi-predictive fusion in 3D-HEVC, the re-
[0128] [0128] One or more of the techniques of this disclosure - refer to the process of deriving candidates for combined bi-predictive fusion in 3D-HEVC. According to an exemplary technique of this disclosure, the design of the process for deriving candidates for combined bi-predictive fusion in 3D-HEVC is replaced by that used in HEVC. Therefore, there is no need to verify PVSP indicators in the process of deriving candidates for combined bi-redictive fusion. In other words, the process of generating the list of merger candidates takes place without checking any BVSP indicators. Failure to verify the BVSF indicators in the process of deriving candidates for combined bi-predictive fusion can reduce the complexity of the encoding/decoding process without significantly negatively impacting the encoding effectiveness.
[0129] [0129] In this way, this disclosure can provide a method to encode data associated with 3D video. This method may comprise generating a list of merge candidates to configure a video block associated with 3D video according to a merge list derivation process. The list includes one or more candidates for the bi-predictive merger. The process of deriving merge lists for 3D video corresponds to the same process as . derivation of merge lists and is associated with non-3D video.
[01313] [01313] Thus, in some instances, a . Video encoder can encode data associated with 3D video. As part of encoding the data, the video encoder can generate a list of merge candidates to encode a video block (a PU, for example) of the 3D video. As part of generating the fusion candidate list, the video encoder can determine if the number of fusion candidates in the list is less than
[0132] [0132] Alternatively, in some examples, - before the combined bi-predictive fusion candidate derivation process is called, the maximum number of MVP fusion candidates, MaxNumMergeCand is fixed as follows: MaxNumMergeCand = B - five minus max in a merge cand. After the The combined bi-predictive merge candidate derivation process is called the MaxNumMergeCand is set back to the value, such as . in 3BD-HEVC: MaxNumMergeCand = 5 - five minus max num merge cand + iv mv pred flag[ . bare layer id ]. nuh layer id is the syntax element that specifies a layer identifier. Thus, in some such examples, before deriving the combined bi-predictive fusion candidate or candidates, a video encoder may reset the maximum number of fusion candidates to be equal to 5 minus the value of a first syntax element. The first syntax element specifies the maximum number of merge candidates succeded in a slice subtracted from 5. After deriving the combined bi-predictive merge candidate or candidates. The video encoder can set the maximum number of merge candidates to 5 mere the value of the first syntax element plus the value of a second syntax element, where the second syntax element indicates whether the prediction of inter-motion parameters view is used in the process of decoding a layer.
[0133] [0133] This is when MaxNumMergeCand equals 6 and there are five candidates before the combined bi-predictive fusion candidate derivation process in HEVC is called, a candidate zero (with reference index and motion vector components all being 0 ) is always - generated and inserted into the list of candidates for merger, as specified in sub-paragraph 8.5.3.2.4 of the Draft. Operational HEVC 10.
[0134] [0134] Alternatively, the video encoder sets MaxNumMergeCand to 5 before invoking the process to determine candidates for bi-predictive fusion and the video encoder only considers the first four candidates as input to this process. After that the video encoder calls the process to determine. bi-predictive fusion candidates, the video encoder puts the newly generated bi-predictive fusion candidate, if available, at the end of the list of fusion candidates. Thus, the newly generated bi-predictive fusion candidate follows the 4th. in the list of fusion candidates, which the video encoder did not consider as part of the input to the process to determine candidates for the bi-predictive fusion. Later, in this example, the MaxNumMergeCand is set again to &€6. When the process for determining bi-predictive fusion candidates does not present a new bi-predictive fusion candidate, the video encoder generates candidate zero and inserts candidate zero into the list of fusion candidates, as specified in sub-paragraph 8E .5.3.2.4 in Operational Draft HEVC 10, Sub-paragraph 8.5.3.2.4 in Operational Draft HEVC 10 is reproduced next.
[0135] [0135] Thus, in some examples where the maximum number of merge candidates (MaxNumMergeCand, for example) is equal to 6, a video encoder may, in response to the determination that there are five merge candidates in the list of merge candidates. merge before adding any of the bi-predictive merge candidate or candidates to the list, the video encoder may include a zero candidate in the list. The motion vector components of candidate zero are equal to 0 and the reference index of candidate zero is equal to 0.
[0136] [0136] The following FS section of this disclosure describes some exemplary implementation details compatible with the techniques of this disclosure in the context of -HEVC. Shown below are changes to sections of the 3D-HEVC Draft Text 1. Multiple vartes shown in 7 bold underlined or in underlined italic — can correspond to additions to HEVC sections, and parts shown in italics surrounded with double square brackets (fftext] j], for example) can correspond to deletions. The techniques of this disclosure may correspond, in some examples, to the additions shown in bold underlined and the deletions shown in circled italics. pDOr double square brackets.
[0138] [0138] Figure 8 is a block diagram showing an exemplary video encoder 20 that can implement the techniques of this disclosure. Figure 8 is presented for purposes of explanation and should not be considered as limiting the techniques broadly exemplified and described in this disclosure. For purposes of explanation, this disclosure describes the video encoder in the context of HEVC encoding. However, the techniques of this disclosure can be applied to other standards and encoding methods.
[0139] [0139] In the example of Figure 8, the video encoder 20 includes a prediction processing unit 100, a video data memory 101, a unit of . waste generation 102, a transform processing unit 104, a quantification unit 106, a . an inverse quantization unit 108, an inverse transform processing unit 120, a reconstruction unit 112, a filter unit 114, a decoded picture buffer 116, and an entropy coding unit 118. The prediction processing unit 1CO0 includes an inter-prediction processing unit 120 and an intra-prediction processing unit 126. A. inter-prediction processing unit 120 includes a motion estimation unit 122 and a unit of . motion compensation 124. In other examples, video encoder 20 may include more, less, or different functional components.
[0140] [0140] Video encoder 20 can receive video data. The video data memory 101 can store video data to be encoded by the components of the video encoder 20. The video data stored in the video data memory 101 can be obtained, for example, from the video source 18. The store of decoded pictures 116 may be a reference index memory which stores reference video data for use in the video data condition by video encoder 20, such as in intra- or inter-coding modes. Video data memory 101 and/or decoded picture store 116 can be formed by any of a variety of memory apparatus, such as dynamic random access memory.
[0141] [0141] the video encoder 20 can encode each CTU in a slice of an image of the video data. Each of the CTUS can be associated with code tree blocks (CTBs) luma of equal size and with corresponding CTBs of the image. As part of the coding. of a CTU, the prediction processing unit 100 can perform mode quad-tree transformation partitioning. to divide the CTBs of the CU into progressively smaller blocks. Smaller blocks can be CU encoding blocks. For example, prediction processing unit 100 can partition a CTB associated with a CU into four subblocks of equal size, partition one or more of the subblocks into four subblocks of equal size, and so on.
[0142] [0142] the video encoder 20 can encode CUs of a CTU so as to generate encoded representations of the CUs (i.e. encoded CUs). As part of an encoding of a CU, the prediction processing unit 100 may partition the encoding blocks associated with the CU among one or more PUs of the CU. Thus, in some examples each PU can be associated with a luma prediction block and corresponding chroma prediction blocks. Video encoder 20 and video decoder 30 can support PUs of various sizes. As indicated above, the size of a CU can refer to the size of the CU's luma coding block and the size of a PU can refer to the size of a PU's luma prediction block. Assuming the size of a specific CU is 2Nx2N, video encoder 20 and video decoder 30 can * support PU sizes of 2NxX2N or NxN, for interprediction and symmetric PU sizes of 2Nx2N, 2NxXN, Nx2N ' Or NXN or similar for interprediction. Video encoder 20 and video decoder 30 can also support asymmetric partitioning for PU sizes of 2NxnU, 2NxnD, nLx2N and nRx2N for inter-prediction.
[0143] [0143] The inter-prediction prediction unit 120 can generate predictive data for a PU by performing inter-prediction on each PU of a CU, The predictive data . for the PU can include predictive blocks for the PU and motion information for the PU. to inter-prediction processing unit 120 can perform different operations for a PU of a CU depending on whether the PU is in an I-slice, a P-slice, or a B-slice. predicted. Consequently, if the PU is in an I-slice, the inter-prediction prediction unit 120 does not inter-predict on the PU. Thus, for mode II coded blocks, the predicted block is formed using spatial prediction from previously coded neighbor blocks within the same frame.
[0144] [0144] If a PU is in a P slice, the motion estimation unit 122 can fetch the reference pictures in a list of reference pictures (“RefPicListO0”, for example) for a reference region for the PU. The reference region for the PU can be a region, within a reference image, that contains sample blocks that most closely correspond to the PU sample blocks. The furniture estimation unit
[0145] [0145] If a PU is on a B-slice, the motion estimation unit 122 can perform either a prediction or a bi-prediction for the PU. To perform uniprediction for the PU, the motion estimation unit 122 can fetch the reference pictures from RefPicListO0 or from a second list of reference pictures (“RefPicListl1”) for a reference region for the PU. The motion estimating unit 122 can transmit, as the motion information of the PU, an indicator indicating a position in RefPicListd or RefFicListl of the reference image containing the reference region. A motion vector that indicates a special offset between a PU sample block and the reference location associated with the reference region, and one or more prediction direction indicators that indicate whether the reference image is in RefPicListO or RefPicListl. The processing unit 124 can generate the PU predictive blocks based, at least in part, on real or interpolated samples associated with the reference location indicated by the PU motion vector.
[0146] [0146] To perform inter-prediction . bidirectional for a PU, the motion estimating unit 122 can fetch the reference images in RefPicList0 for a reference region for the PU which can also fetch the reference images in RefPicListI for another reference region for the PU, for the unit Motion estimation 122 can generate reference indices that indicate the positions in RefPicList0 and RefPicList1 of the reference images that contain the reference regions. In addition, the motion estimating unit 122 can . generate motion vectors that indicate spatial displacements between the reference locations associated with . reference regions and a prediction block (a sample block, for example) from the PU. PU motion information can include PU reference indices and motion vectors. The motion compensation unit 124 can generate the PU predictive blocks based, at least in part, on real or interpolated samples associated with the reference region indicated by the PU motion vectors.
[0147] [0147] According to one or more techniques of this disclosure, the motion estimation unit 122 can generate a list of merge candidates to encode a 3D video video block. As part of generating the merge candidate list, the motion estimation unit 122 can determine if the number of merge candidates in the merge candidate list is less than 5. In response to the determination that the number of merge candidates if the list of fusion candidates is less than 5, the motion estimation unit 122 can derive one or more combined bi-predictive fusion candidates. The motion estimation unit 122 may include combined bi-predictive merge candidate or candidates in the merge candidate list. Furthermore, in some examples, the . movement estimation unit 122 can select a 'merger candidate from the merge candidate list. The 7 video encoder 20 can flag the position in the merge candidate list of the selected merge candidate. In some examples, the maximum number of merger candidates in the merger candidate list is equal to or greater than 5 (6, for example).
[0148] [0148] Continuous reference is now made to the example in Figure 8. The intra-processing unit. prediction 126 can generate predictive data for a PU by performing intra-prediction on the PU. Predictive data for the 2 PU can include predictive blocks for the PU and various syntax elements. The intraprediction prediction unit 126 can perform intraprediction on PUs in slice 1, slice P and slice B.
[0149] [0149] To perform intra-prediction on a PU, the intra-prediction processing unit 126 can use various intra-prediction modes to generate multiple sets of predictive data for the FU. To use some intra-prediction modes to generate a predictive dataset for the PU, the intra-prediction processing unit 2126 can extend samples from neighboring blocks through the PU predictive block in a direction associated with the intra-prediction mode. ., Neighboring PUs can be above and right, above and left or left of the PU, assuming a left-to-right, top-to-bottom encoding order for the PUs, CUs, and CTUsS. the intra-prediction processing unit 126 can utilize various numbers of intra-prediction modes, such as thirty-three directional add-on modes. In some examples, the number of intra-prediction modes may depend on the size of the region associated with the PU,
[0150] [0150] The prediction processing unit 100 can select the predictive data for PUs of a CU from among the predictive data generated by the interprediction processing unit 120 for the PU or from the predictive data generated by the intra-processing unit. prediction 126 for the PUs. In some examples, prediction processing unit 100 selects predictive data for the CU PUs based on rate/distortion metrics from the predictive data sets. Predictive blocks of selected predictive data may here be referred to as selected predictive blocks.
[9151] [9151] The waste generation unit 102' can generate, based on the coding blocks (Cb, Cr coding blocks, for example) of a CU and on the selected predictive blocks (predictive luma, Cb and Cr blocks, for example ) of the PUs of the CU, residual blocks (residual luma, residual blocks Cb and Cr, for example) of the CU. In other words, the waste generation unit 102 can generate a residual signal for the CU. For example, the waste generation unit 102 can generate the residual blocks of the CU so that each sample of the residual blocks has a value equal to differences between a sample in a CU coding block and a corresponding sample in a corresponding selected predictive block of a CU PU.
[0152] [0152] The transform processing unit can perform quad-tree transform partitioning to partition the residual blocks associated with a CU into transform blocks that correspond to (ie, associated with) the TUs of the CU. Thus, a TU can be FB Lesson 870210042115, of 05/10/2021, p. 106/204 associated with a luma transform block and two chroma transform blocks, The sizes and positions of the transform blocks (luma and chroma transform blocks, example DOF) of TUs of a CU may or may not be based on the sizes and positions of the prediction blocks of the CU PUs. A while known as “residual quad-tree transformation” (RQT) can include nodes associated with each of the TUs. The TUs of a CU can correspond to the leaf nodes of the RT.
[0153] [0153] The transform processing unit 104 can generate blocks of transform coefficients for each TU of a CU by applying one or more transforms to the transform blocks of the TU. The transform processing unit 104 can apply several “transforms to a block of transforms associated with a TU. For example, the transform processing unit 104 can apply a discrete cosine transform (DCT), a directional transform, or a conceptually similar transform to a block transform. In some examples, the transform processing unit 104 does not apply transforms to a block of transforms. In such examples, the transform block can be treated as a transform coefficient block.
[0154] [0154] The quantization unit 106 can quantize the transform coefficients in a block of transform coefficients. The quantization process can reduce to the bit depth associated with some or all of the transform coefficients of a block of transform coefficients. For example, an n-bit transform coefficient can be rounded to an m-bit transform coefficient during quantification, where n is greater than m. The unit of "Petition 870210042115, of 05/10/2021, p. 107/204 quantize 106 can quantify. A block of transform coefficients associated with a TU of a CU based on a quantization parameter (OP) value associated with the CU. Video encoder 20 can adjust the degree of quantization applied to blocks of transform coefficients associated with a CU by adjusting the QP value associated with the CU. Quantification can introduce information loss, so quantified transform coefficients may have lower precision than the original OS.
[0155] [0155] The inverse quantization unit 108 and the inverse transform processing unit 110 can apply inverse quantization and inverse transforms to a block of transform coefficients, ] respectively, to reconstruct a residual block (i.e., a block of transforms ) from the block of 1 transform coefficients. The reconstruction unit 112 can reconstruct a coding block of a CU such that each sample of the coding block is equal to the sum of a sample of a predictive block of a PU of the CU and a corresponding sample of a block of transforms of a TU from CU. For example, the reconstruction unit 112 can add reconstructed residual blocks of TUs of a CU to corresponding samples of one or more predictive blocks of PUs of the CU generated by the prediction processing unit 100, so as to produce a reconstructed encoding block of CU. Thus, by reconstructing blocks of transforms for each TU of a CU in this way, the video encoder 20 can reconstruct the encoding blocks of the CU.
[0156] [0156] The filter unit 114 can perform one or more unlocking operations to reduce the blocking artifacts in the code blocks associated with a CU. The decoded picture store 116 can store the reconstructed code blocks after the filter unit 114 performs the unlocking operation or operations on the reconstructed coding blocks. Thus, the decoded image store 116 may be a memory configured to store video data. The inter-prediction processing unit 120 can use a reference picture containing the reconstructed coding blocks to perform inter-prediction on PUs of other pictures. In addition, the intra-prediction prediction unit 126 can use reconstructed coding blocks in the decoded image store 116 to perform intra-prediction on other PUs in the same CU image.
[0158] [0158] Figure 9 is a block diagram showing an exemplary video decoder 30 that is configured to implement the techniques of this disclosure. Figure 3 is presented for purposes of explanation and does not limit the techniques broadly exemplified and described in this disclosure. For purposes of explanation, this disclosure describes video decoder 30 in the context of HEVC encoding. However, the techniques of this disclosure can be applied to other coding standards or methods.
[0159] [0159] In the example of Fig. 9, the video decoder 30 includes an entropy decoding unit 150, a video data memory 151, a prediction processing unit 152, an inverse quantization unit 154, an inverse quantization unit. inverse transform processing 156, a reconstruction unit 158, a filter unit 160 and a decoded image store 162. The prediction processing unit 152 includes a processing unit 164 and an intra-prediction processing unit 166. In other examples, the video decoder 39 may include more, less or different functional components.
[0160] [0160] Video decoder 30 can receive a bit stream. The video data memory 151 can store video data, such as an encoded video bitstream to be encoded by the components of the video decoder 30. The video data stored in the video data memory 151, can be obtained, for example, from channel 16, such as from a local video source such as a camera, via wired or wireless network communication of video data, or by accessing physical data storage media. The video data memory 151 can form a CPB encoded picture store which stores encoded video data of an encoded video bitstream. The decoded picture store 162 may be a reference picture memory that stores reference video data for use in decoding video data by the video decoder 30 in intra- or inter-condition modes, for example. The video data memory 151 and the decoded picture store 162 can be formed by any of a number of memory devices, such as . dynamic random access memory (DRAM), which includes Synchronous DRAM (SDRAM), magneto-resistive RAM (MRAM), resistive RAM (RRAM), or other types of memory devices. decoded images 162 may be provided by the same memory device or by separate memory devices. In several examples, video data memory 101 may be built-in with other components of video encoder 20 or not built-in with respect to those components.
[0161] [0161] The entropy decoding unit 150 can parse the bit stream in order to decode syntax elements of the bit stream. The entropy decoding unit 150 can entropy decode the entropy encoded syntax elements in the bit stream. The prediction processing unit 152, the inverse quantization unit 154, the inverse transform processing unit 156, the reconstruction unit 158 and the filter unit 160 can generate decoded video data based on the obtained (extracted) syntax elements , for example) of the bit stream.
[0162] [0162] The bit stream may comprise a series of NAL units. Bitstream NAL units can include encoded slice NAL units. As part of bitstream decoding, the entropy decoding unit 150 can obtain (extract, for example) and entropy decode syntax elements from the encoded slice NAL units. Each of the encoded slices can include a slice header and slice data. The slice header can contain Syntax elements referring to a slice. The syntax elements in the slice header can include a syntax element that identifies a PPS associated with an image containing the . slice.
[0163] [0163] In addition to obtaining (decoding, for example) syntax elements from the bitstream, the video decoder 30 can perform a reconstruction operation in cus. To perform the rebuild operation on a CU (an unpartitioned CU, for example), the video decoder 30 can perform a rebuild operation on each TU of the CU. By performing the rebuild operation for each TU of the CU, the video decoder 30 can rebuild residual blocks (i.e., blocks of transforms) of the TU's of the CU.
[0164] [0164] As part of performing a reconstruction operation on a TU of a CU, the inverse quantization unit 154 can inverse quantize, i.e., dequantize, the coefficient blocks of (ie, associated with) the TU. The inverse quantization unit 154 may use a QP value associated with the CU of the TU in order to determine the degree of quantization and, likewise, the degree of inverse quantization for the inverse quantization unit 154 to apply. That is, the compression ratio, that is, the ratio of the number of bits used to represent the original sequence and the packed number, can be controlled by adjusting the value of what P is used when quantifying transform coefficients. The compression ratio may also depend on the entropy encoding method used.
[0165] [0165] After the inverse quantization unit 154 inverse quantizes a block of coefficients, the inverse transform processing unit 156 can apply one or more inverse transforms to the coefficient block in order to generate a radio signal block associated with the TU . For example, the inverse transform processing unit may apply an inverse DCT, an inverse integer transform, an inverse Karhumen-Loeve (KLT) transform, an inverse rotational transform, an inverse directional transform, or another inverse transform to the coefficient block.
[0166] [0166] If a PU is coded using intra-prediction, the intra-prediction processing unit 166 can perform intra-prediction in order to generate predictive blocks for the PU. For example, the intra-prediction processing unit 166 can use the intra-prediction mode to generate the predictive luma, Cb and Cr blocks for the PU based on the prediction blocks of spatially neighboring PUs. The intra-prediction unit 166 can determine the intra-prediction mode for the PU based on one or more syntax elements decoded from the bit stream.
[0167] [0167] The prediction processing unit 162 can build a first reference picture list (RefPicList0) and a second reference picture list (RefPicListI), based on syntax elements obtained from the bit stream. Furthermore, if a PU is encoded using intra-prediction, the entropy decoding unit 150 can determine (extract, for example) motion information for the PU. The motion compensation unit 164 can determine, based on motion information from the PU, one or more reference blocks for the PU. The motion compensation unit 164 can generate, based on blocks of samples in the block or reference blocks for the PU, predictive blocks (predictive luma, Cb and Cr blocks, for example) for the PU.
[0168] [0168] As noted above, the video encoder 20 can signal the motion information of a PU using merge mode, hop mode or AMVF mode. When the video encoder 20 signals the motion information of the current PU using the Í AMVP mode, the vor entropy decoding unit 150 can decode, from the bit stream, a reference index, an MVD for the current PU and the candidate index. Furthermore, the motion compensation unit 164 can generate a list of AMVFE candidates for the current PU. The AMVP candidate list includes one or more motion vector predictor candidates. Each of the motion vector candidates specifies a motion vector of a PU that is special or temporally neighboring the current PU. The motion compensation unit 164 can determine, based at least in part on the candidate index, a motion vector predictor candidate selected from the AMVP candidate list. The motion compensation unit 164 can then determine the motion vector of the current PU by adding the MVD to the motion vector specified by the selected motion vector predictor candidate. In other words, for AMVP, the motion vector is calculated as motion vector (MV) = MVP + MVD, where the motion vector predictor index (MVP) is signaled and the MVP is one of the candidate vector of movement (spatial or temporal) of the AMVP list, and the MVD is signaled to the decoder side.
[01689] [01689] If the current PU is bi-predicted and the PU motion information is signaled in AMVP mode, the vor entropy decoding unit 150 can decode an additional reference index, MVD, and the candidate index from the stream of bits. The motion compensation unit 162 can repeat the process described above using the additional reference index, MVD, and the candidate index in order to derive a second motion vector for the current PU. In this way, the motion compensation unit 162 can derive the motion vector for RefFicListO (i.e. a motion vector RefPicListO0) and a motion vector for RefPicListl (i.e., a motion vector RefPicListI).
[0170] [0170] According to one or more techniques of this disclosure, motion compensation unit 164 can generate a list of merge candidates to encode a 3D video block. As part of the generation of the merger candidate list, the motion clearing unit 164 can determine if the number of merger candidates in the merger candidate list is less than 5. In response to the determination that the number of merger candidates if the list of merge candidates is less than 5, the motion compensation vanity 164 can derive one or more merged bi-predictive merge candidates. The motion compensation unit 164 may include the combined bi-predictive merge candidate or candidates in the merge candidate list. Also, in some examples, Video decoder 30 may obtain, from a bitstream, a syntax element that indicates a merge candidate selected from the merge candidate list. The motion compensation unit 164 can use motion information of the selected candidate to generate predictive samples of the current PU. In some examples the maximum number of merger candidates in the list of merger candidates is equal to or greater than 5 (6, for example).
[0172] [0172] The filter unit 160 can perform an unlocking operation to reduce the blocking artifacts associated with the coding blocks (luma, Cb and Cr coding blocks, for example) of the CU. The video decoder 30 can store the coding blocks (luma, Ch and Cr coding blocks, for example) of the CU in the decoded picture store 162. The decoded picture store 162 can provide reference pictures for motion compensation, intra - subsequent prediction and presentation in a display apparatus, such as the display apparatus 32 of Figure 1. For example, the video decoder 30 can perform, based on the blocks (luma, Cb and Cr encoding blocks, for example) in the decoded image store 162, intra-prediction or inter-prediction operations on PUs of other CUs. In this way, the video decoder 30 can obtain from the bit stream significant levels of transform coefficients of the block of coefficients uma, quantify by inversion the levels of transform coefficients, apply a transform to the transform coefficient levels so as to generating a transform block, generating, header at least in part in the transform block, an encoding block, and transmitting the encoding block for display.
[0174] [0174] In the example of Fig. 10A, the video encoder 20 can generate a list of fusion candidates (200). In other words, the video encoder 20 can generate a list of merge candidates. Figures 11 and 12, described elsewhere in this disclosure, show an exemplary operation for generating the list of fusion candidates. In some examples, video encoder 20 may generate the merge candidate list in the same manner as video decoder 30. According to one or more techniques of this disclosure, when video encoder 20 generates the merge candidate list , the video encoder can determine if the number of fusion candidates in the fusion candidate list is less than 5. In response to the determination that the number of fusion candidates in the fusion candidate list is less than 5, the encoder of video 20 can derive one or more candidates for bi-predictive fusion. Video encoder 20 may include the bi-predictive fusion candidate or candidates in the fusion candidate list. In some examples, the maximum number of merger candidates in the list of merger candidates is 6.
[0175] [0175] Furthermore, in the example of Figure 10A, the video encoder 20 can select a cardidato from the list of merge candidates (202). In some examples, Video encoder 20 can signal the selected candidate in a bit stream. For example, video encoder 20 can include a merge index syntax element in the bit stream. Video encoder 20 can encode a video block based on the selected candidate (204). For example, the video block could be a CU. In this example, the video encoder 20 can use the motion information (such as motion vectors, reference indices, etc.) of the selected candidate in order to determine a predictive block for a CU PU. Furthermore, in this example, the video encoder 209 can determine values of at least some samples of a transform block (a residual block, for example) on the basis of samples of the predictive block and corresponding samples of a coding block of the CU. For example, video encoder 70 can determine values of at least some of the transform block samples so that the samples are equal to the differences between the predictive block samples and the corresponding samples of a Cu coding block.
[0176] [0176] Figure 10B is a flowchart showing an exemplary operation of video decoder 30 to decode data associated with 3D video, in accordance with one or more techniques of this disclosure. In the example of Fig. 108B, video decoder 30 can generate a list of fusion candidates (220), in other words, and video decoder 30 can generate a list of fusion candidates. Figures 11 and 12, described elsewhere in this disclosure, show an exemplary operation for generating the list of fusion candidates. In some examples, video decoder 30 may generate the merge candidate list in the same manner as video encoder 20. According to one or more techniques of this disclosure, when video decoder 30 generates the merge candidate list , the video decoder 30 can determine whether the : number of fusion candidates in the fusion candidate list is less than 5 and in response to the determination that the number of fusion candidates in the fusion candidate list is less than 5, video decoder 30 can derive one or more candidates for bi-predictive fusion. Video decoder 30 may include the bi-predictive fusion candidate or candidates in the fusion candidate list. In some examples, the maximum number of merger candidates in the merger candidate list is equal to 6
[0177] [0177] Furthermore, in the example of Fig. 10B, the video decoder 30 can determine a candidate selected from the merge candidate list (222). In some examples, the video decoder 30 can determine the selected candidate based on the value indicated by a signed syntax element in a bit stream. Video decoder 30 can decode a block of video based on the selected candidate (224). For example, the video block could be a CU. In this example, the video decoder 30 can use the motion information (such as motion vectors, reference indices, etc.) of the selected candidate in order to determine a predictive block for a CU PU. Furthermore, in this example, the video decoder 30 can determine values of at least some of the samples of a CU encoding block based on the predictive block. For example, the video decoder 30 can determine values of at least some of the samples of the coding block so that the samples are equal to the sum of the samples of the predictive block and the corresponding samples of a block of transforms of a TU of the CU.
[0178] [0178] Figure 11 is a flowchart showing a first part of an exemplary operation 300 to build a list of merge candidates for the current block, according to one or more techniques of this disclosure.
[0179] [0179] In the example of Figure 11, a video encoder (the video encoder 20 or the video decoder 30, for example) can determine an IPMVC (302). In some examples, the video encoder can determine the IPMVC using a disparity vector for the current block in order to identify a corresponding block in an inter-view reference image. In such examples, if the corresponding block is not intra-predicted and not intervened-predicted and has a temporal motion vector (that is, a motion vector that indicates the location in a reference image associated with a time occurrence of the of the corresponding block), IPMVC can specify the motion vectors of the corresponding block, prediction direction indicators of the corresponding block in converted reference indices of the corresponding block. Then the video encoder can determine if no IPMVC is available (304). In some examples, IPMVC is unavailable if the corresponding block in the inter-view reference image is intra-predicted or outside the boundaries of the inter-view reference image. In response to the determination that IPMVC is available (“YES” of 304), the video encoder may enter IPMVC into the list of candidates for merger (3060
[0180] [0180] After entering IPMVC in the merge candidate list or in response to the determination that IPMVC is not available (“NO”), the video encoder can check the spatial neighbor PUs to determine if the spatial neighbor PUs have motion vectors available (308). In some examples, the spatial neighbor PUs ] cover the locations indicated as As, A, By, B, and B in Figure 2. For ease of explanation, this disclosure may refer to the PU movement information covering the An locations. ,A,By,B; and Br; such as Ace, A:, Bo, By and B;, respectively.
[0181] [0181] In the example of Fig. 11, the video encoder can determine if À: corresponds to IPMVC (310). In response to the determination that A; does not match IPMVC (“NO” of 310), video encoder may insert A, in list of merge candidates (312). Otherwise, in response to the determination that A: corresponds to the IPMVC ("YES" of 310) or after entering A in the list of merger candidates, the video encoder can determine whether Bi; corresponds to Ay or IPMVC (314). In response to the determination that By does not correspond to A; or to IEMVC (“NO” of 314), the video encoder can insert B- in the list of candidates for fusion (316). On the other hand, in response to the determination that B;' corresponds to A; or to
[0182] [0182] Figure 12 is a flowchart showing a second part of the exemplary operation 300 of Figure 11 to build a list of merge candidates for the current block, in accordance with one or more techniques of this disclosure. As noted above, the video encoder can perform the part of operation 300 shown in Figure 12 if the IDMVC is not available or if the IDMVC responds to one or a (“NO” list of merge candidates. Consequently, if the IDMVC does not) is available or if the IDMVC matches A, or B; (“No” of 332) or after entering the IDVWC in the list of merge candidates, the video encoder can determine if the BVSP is enabled (336). BVSFP is enabled ("YES" of 336), the video encoder can enter a DVSP candidate in the merge candidate list (338). If BVSP is not enabled (VNO" of 336) or after entering the candidate for BVSP in the merge candidate list, the video encoder can determine if A is available 340. If there is available (“YES” of 340) the video encoder can insert Ac into the merge candidate list (342). Otherwise, if A; is not available ("NO" of 340) or after entering A, in the candidate list those to merge, oThe video encoder can determine if B;, is available (344). If B is available (VSIM" of 344) the video encoder can insert B; into the merge candidate list (346).
[0183] [0183] If B; is not available (“NO” of 344) or after entering B; in the list of merge candidates, the video encoder can determine whether the indoor-view mobile prediction is applied (348). In other words, : the video encoder can determine if the current block can be encoded using inter-view motion prediction. In response to the determination that inter-view motion prediction is applied ("YES" of 348) the video encoder may determine an offset candidate (350). In other words, the video encoder can determine a candidate for DSMV, as described elsewhere in this disclosure. After determining the hanging candidate, the video decoder 30 can determine if the hanging candidate is available (352). If a hanging candidate is available ("YES" of 352), the video encoder can include a hanging candidate in the merge candidate list (354). If the intervening motion prediction is not applied (CNOT" of 348), the shifted candidate is not available ("NOT" of 352) or after including the shifted candidate in the merge candidate list, the video encoder may include a candidate for temporal merger in the list of candidates for merger (356).
[0184] [0184] In addition, the video encoder can perform a derivation process for candidates for combined bi-predictive fusion (358). An exemplary derivation process for combined bi-predictive fusion candidates according to one or more techniques of this disclosure is described below with reference to Figure 13. In addition, the video encoder can perform a derivation process for zero vector candidates of movement (360). An exemplary derivation process for zero motion vector candidates is described in section 8.5.3.2.4 of WD HEVC 10.
[0185] [0185] Figure 13 is a flowchart showing an exemplary derivation process for candidates for combined bi-predictive fusion according to one or more techniques' of this disclosure. The derivation process of Figure 13 can be performed without checking any BVSP indicators. For example, the derivation process of Figure 13 can be performed without presenting meraCandIsvVspFlag as input to the derivation process for candidates for combined bi-predictive fusion, as is done in section H.
[0186] [0186] In the example of Figure 13, a video encoder (the video encoder 20 or the video decoder 30, for example) can determine whether to the current slice (that is, the slice that the video encoder is currently encoding) it is a B (400) slice. If the current slice is not a B slice (“NOT” of 400), the video encoder may terminate the derivation process for candidates for blended bi-predictive fusion. However, in response to the determination that the current slice is a B slice (“YES” of 400), the video encoder can determine whether the number of merge candidates in the merge candidate list (that is, the candidate list to fusion) is less than 5 (402). If the number of fusion candidates in the fusion candidate list is not less than 5, the video encoder can terminate the derivation process for blended bi-predictive fusion candidates.
[0187] [0187] On the other hand, in response to the determination that the number of fusion candidates in the fusion candidate list is less than 5 (“YES” of 402), the video encoder may set the value of a match index (withIdx, for example) to zero (404). the video encoder can then determine whether motion vectors corresponding to the current value of the match index are available (406).
[0189] [0189] In addition, the video encoder can determine if the current value of the merge index is equal to (numOr igMergeCand* (numorigMergeCand - 1)), where numOrigMergeCand denotes the number of candidates in the merge candidate list before the call of the derivation process of Figure 13 (410). If the current value of the blend index is equal (numOrigMergeCand"* (numorigMergeCand - 1)) ("YES" of 410), the video encoder can terminate the derivation process for candidates for bi-predictive blended fusion. On the other hand, if the current value of the blend index is not equal to (numOrigMergeCand* (numOrigMergeCand - 1))(“NO” of 410) the video encoder can determine whether the total number of merge candidates in the candidate list to merge is equal to MaxNumMergeCand (412). As indicated elsewhere in this disclosure, MaxNumMergeCand indicates the maximum number of merger candidates in the list of merger candidates. If the total number of fusion candidates in the fusion candidate list equals MaxNumMergeCand (“YES” of 412), the video encoder can end the derivation process for combined bi-predictive fusion candidates.
[0191] [0191] Figure 14A is a flowchart showing an exemplary operation of video encoder 20 to encode a block of video, in accordance with one or more techniques of this disclosure.
[0192] [0192] In some examples, video encoder 20 may derive the combined bi-predictive fusion candidate or candidates after entering a TIPMVC, if available, in the fusion candidate list, after running a derivation process for fusion candidates. spatial fusion and then running a derivation process for a candidate for temporal fusion. A derivation process for spatial fusion candidates can derive and insert up to four spatial motion vector candidates into the fusion candidate list. The derivation process for the temporal fusion candidate can add a temporal motion vector predictor (TMVP) candidate, if available, to the list of fusion candidates.
[0193] [0193] Furthermore, in the example of Figure 14h, the video encoder 20 can select a candidate from the 'merge candidate list (460). In some examples, video encoder 20 can determine the selected candidate based on the value indicated by a signaled syntax element in a bit stream. In addition, video encoder 20 can signal the position in the merge candidate list of the selected merger candidate (452). Video encoder 20 can encode a block of video based on the selected candidate (564). Video encoder 20 can encode the video block in accordance with one or more of the examples presented elsewhere in this disclosure.
[0194] [0194] Fig. 14B is a flowchart showing an exemplary operation of video decoder 30 to decode a block of video, according to one or more revelation techniques. In the example of Figure 14B, video decoder 30 can generate a list of merge candidates (480). In the example of Fig. 114B, the video decoder 30 can determine if the number of merge candidates in the list is less than 5 (482). In some examples, the video decoder 30 can, in this step, determine if the number of merge candidates in the list is less than 5 and the maximum number of merge candidates in the list is equal to 6, for example. In response to the determination that the number of fusion candidates in the list is less than (“YES”” of 452), the video decoder 30 may derive one or more combined bi-predictive fusion candidates (484). the bi-predictive blended candidate fusion or bi-predictive blended fusion candidates may match a pair of fusion candidates already on the list. The respective blended bi-predictive fusion candidate may be a motion vector combination of a first candidate to the fusion of the respective pair and motion vector of a second candidate to the fusion of the respective pair. The motion vector of the first fusion candidate and the motion vector of the second fusion candidate refer to Ú images in image list different reference (list O and list 1, for example) The video decoder 30 may include the candidate or candidates for the combined bLbi-predictive fusion in the list (586) On the other hand, in some examples, if the number of candidates to the merger in list is not less than 5 (“NOT” of 482), the video decoder 30 does not include any candidate for combined bi-predictive fusion in the list (488).
[0195] [0195] In some examples, the video decoder 30 may derive the combined bi-predictive fusion candidate or candidates after entering an IPMVC, if available, in the fusion candidate list, after running a derivation process for fusion candidates. spatial fusion and then running a derivation process for a candidate for temporal fusion. The derivation process for spatial fusion candidates can derive and insert up to four spatial motion vector candidates into the list of fusion candidates. The derivation process for the temporal fusion candidate can add a temporal motion vector predictor (TMVP) candidate, if available, to the list of fusion candidates.
[0196] [0196] In addition, in the example of Fig. 14B, the video decoder 30 can determine a candidate selected from the merge candidate list (490). In some examples, the video decoder 30 can determine the selected candidate based on the value indicated by a signed syntax element in a bit stream. For example, the video decoder 30 may obtain, from a bitstream, a syntax element that indicates a merge candidate selected from the merge candidate list. The video decoder 30 can decode a video block based on the selected candidate (492). For example, the video decoder 30 may use: selected candidate's motion information to generate predictive samples of the current PU. The video decoder 30 can decode the video block (such as a CU, PU, etc.) in accordance with one or more of the examples presented elsewhere in this disclosure.
[0197] [0197] the following paragraphs present additional examples of this disclosure.
[0198] [0198] Example 1. A method for encoding video data, the method comprising: generating a first list of merge candidates according to a first process for encoding a video block that is not associated with three-dimensional video data, in which the first list includes one or more candidates for bi-predictive fusion; and generating a second list of fusion candidates according to a second process for encoding a video block that is associated with three-dimensional video data, wherein the second list includes one or more bi-predictive fusion candidates, in which the first process and the second process are the same.
[0199] [0199] Example 2. The method of example 1, in which the generation of the first list and the generation of the second list occur only when the following condition is satisfied: the number of cardidates available for merging is less than 5.
[0200] [0200] Example 3. The method of qualavmer one of examples 1 or 2, which also comprises defining the maximum number of merge AMVP merge candidates before calling a derivation process to generate any merge list.
[0201] [0201] Example 4. The method of example 4, in which the maximum number of merge MVP candidates is is defined substantially as follows: MaxNumMergeCand - 5 - five wminus max num merge cand and then after that process is called MaxNumMergeCand is reset to: MaxNumMergeCand = 5 - five minus max num merge cand + iv mv pred flagí nuh layer id ].
[0202] [0202] A method for encoding data associated with 3D three-dimensional video, the method comprising: generating a list of fusion candidates for encoding a block of video associated with 3D video, wherein the list includes one or more bi-predictive fusion candidates combined and where, when the maximum number of merge candidates equals 6 and there are 5 candidates before a merged bi-predictive merge candidate derivation process is called, a zero candidate is generated and added to the list, where a candidate zero sets a reference index on motion vector components to 0.
[0203] [0203] Example 6. A method for encoding data associated with three-dimensional (3D) video the method comprising: generating a list of merge candidates to encode a block of video associated with 3D video, wherein the list includes one or more candidates for the bi-predictive merger and in which, before generating the list, the maximum number of merger candidates is fixed at 5, 4 of the candidates are introduced into a merger list derivation process, and one candidate is newly generated during the merge list derivation process.
[0204] [0204] Example 7. The method of example 6, in which the newly generated candidate is ordered as the fifth candidate in the list.
[0205] [0205] Example 8. The method of Example 6, in which, if the merge list derivation process is unable to generate a newly generated non-zero candidate, The merge list derivation process generates a candidate: of value zero as the newly generated candidate.
[0206] [0206] In one or more examples, the functions described here can be implemented in hardware, software, firmware, or any combination of them. If implemented in software, the functions can be stored or transmitted as one or more instructions or code, on a computer-readable medium and executed by a hardware-based processing unit. Computer readable media may include computer readable storage media which corresponds to a tangible medium such as data storage media or communication media which includes any medium which facilitates the transfer of a computer program from one place to another, such as according to a communication protocol. In this way, the computer-readable media can generally correspond to (1)
[0207] [0207] By way of example, and not limitation, such computer readable storage media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or Other storage devices' magnetic, flash memory or any other medium that can be used to store desired program code ] in the form of instructions or data structures and that can be accessed by a computer. Furthermore, any connection is appropriately termed a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL) or wireless technologies such as infrared, radio and microwave, then coaxial cable, fiber optic cable, twisted pair, DSL or wireless technologies such as infrared, radio and microwave are included in the definition of medium. It should be understood, however, that the media Readable storage media and data storage media do not include connections, carrier waves, signals or other transient media, but are instead directed to tangible, non-transient storage media. Disc (disk and disc) as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disc and Blu-ray disc, where discs (disks) usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above components should also be included within the reach of computer readable media.
[0209] [0209] The techniques of this disclosure can be implemented in a wide variety of devices or equipment, which include a wireless telephone handset, an integrated circuit (IC) or an ICS set (a chipset, for example). Various components, modules or units are described in this disclosure to emphasize functional aspects of apparatus configured to perform the described techniques, but do not necessarily require execution by different hardware units. Rather, as described above, several units can be combined into one codec hardware unit or presented by a collection of interoperable hardware units, which include one or more processors described above, together with appropriate software and/or firmware.
[0210] [0210] Several examples have been described. These and other examples are within the scope of the following claims.

权利要求:
Claims (24)
[1]
1. Method for encoding three-dimensional (3D) video data: Generate a merge candidate list to encode a video block of the 3D video data, in which the maximum number of merge candidates in the merge candidate list is equal to 6 and generating the merger candidate list comprises: determining whether the number of merger candidates in the merger candidate list is less than 5; and in response to the determination that the number of merger candidates on the list of merger candidates is less than 5: to derive one or more combined bi-predictive merger candidates in which each respective candidate's combined bi-predictive merger candidate or bi-predictive blended fusion candidates matches a respective pair of fusion candidates already on the fusion candidate list, where the respective combined bi-predictive fusion candidate is a motion vector combination of a first fusion candidate of the respective pair and motion vector of a second fusion candidate of the respective pair, wherein the motion vector of the first fusion candidate and the motion vector of the second fusion candidate refer to images in different reference image lists; and include the combined bi-predictive merger candidate or candidates on the list of merger candidates.
[2]
The method of claim 1, wherein generating the list of merger candidates further comprises: in response to determining that there are 5 merger candidates in the merger candidate list prior to adding any of the candidate or candidates to the bi-predictive merger to the merger candidate list, add candidate zero to the merger candidate list, where candidate zero's motion vector components are equal to zero, and Candidate zero's benchmark is equal to zero, The reference index indicating the location of a reference image in a list of reference images.
[3]
A method according to claim 2, in which the generation of the list of fusion candidates takes place without checking any back strain view synthesis indicators (BVSP).
[4]
The method of claim 1, wherein the method for encoding the data comprises a method for decoding the 3D video data, and the video block is a prediction unit (PU), the method further comprising: obtain, from a bitstream, a syntax element that indicates a merge candidate selected from the ' list of merge candidates; and use the selected candidate's movement information to generate predictive PU samples.
[5]
The method of claim 1, wherein the method of encoding the data comprises a method of encoding the 3D video data, the method comprising: selecting a fusion candidate from the list of fusion candidates; and flag the position on the merger candidate list of the selected merger candidate.
[6]
The method of claim 1, wherein: generating the list of fusion candidates comprises deriving the combined bi-predictive fusion candidate or candidates after entering an inter-view prediction motion vector (IPMVC) candidate , if available in the list of fusion candidates, after running a derivation processor for spatial fusion candidates and after running a derivation process for a temporal fusion candidate, the derivation process for spatial fusion candidates drifts and inserts up to four spatial motion vector candidates in the merge candidate list, and the derivation process for the temporal merge candidate adds a temporal motion predictor candidate (TMVP), if available, to the merge candidate list.
[7]
A video encoding apparatus comprising: a data storage medium configured: to store three-dimensional (3D) video data; and one or more processors configured to: ' generate a merge candidate list to encode a video block of the 3D video data, in which the maximum number of merge candidates in the merge candidate list is equal to 6 and as part from the generation of the merger candidate list, the process or processors: determine whether the number of merger candidates on the merger candidate list is less than 5; and in response to the determination that the number of merger candidates on the list of merger candidates is less than 5: derive one or more combined bi-predictive merger candidates in which each respective combined bi-predictive merger candidate of the candidate or candidates to the combined bi-predictive fusion corresponds to a respective pair of fusion candidates already in the list of fusion candidates, where the respective combined bi-predictive fusion candidate is a motion vector combination of a first fusion candidate of the respective pair and motion vector of a second fusion candidate of the respective pair, wherein the motion vector of the first fusion candidate and the motion vector of the second fusion candidate refer to images in different reference image lists; and include the combined bi-predictive merger candidate or candidates in the list of merger candidates.
[8]
The video encoding apparatus of claim 7, wherein, as part of generating the list of fusion candidates, the processor or processors: include, in response to the determination that there are 5 fusion candidates in the list of fusion candidates before , adding any of the bi-predictive fusion candidate or candidates to the list of fusion candidates, a candidate zero in the fusion candidate list, where candidate zero's motion vector components are equal to zero, and candidate reference index zero equals zero, the reference index indicating the location of a reference image in a list of reference images.
[9]
A video encoding apparatus as claimed in claim 7, wherein the processor or processors generate the list of merge candidates without checking any back strain view (BVSP) synthesis indicators.
[10]
A video encoding apparatus according to claim 7, in which the processor or processors are configured to decode the 3D video data, and the video block is a prediction unit (PU), the processor or processors being configured to obtain, from a bitstream, a syntax element that indicates a merge candidate selected from the merge candidate list; and use the selected candidate's movement information to generate predictive PU samples.
[11]
The video encoding apparatus of claim 7, wherein the processor or processors are configured to encode the 3D video data, the processor or processors being configured to: select a fusion candidate from the list of fusion candidates ; and flag the position on the merger candidate list of the selected merger candidate.
[12]
A video encoding apparatus according to claim 7, wherein: the processor or processors are configured to derive the combined bi-predictive fusion candidate or candidates after inputting an inter prediction motion vector candidate or candidates. -view (IPMVC), if available, in the list of fusion candidates, after running a derivation process for spatial fusion candidates, and after running a derivation process for a temporal fusion candidate, the derivation process for spatial fusion candidates. spatial fusion derives and inserts up to four spatial motion vector candidates into the fusion candidate list, and the derivation process for the temporal fusion candidate adds a temporal motion predictor candidate (TMVP), if available, to the candidate list to merger.
[13]
13. Video encoding device comprising:
a device for generating a list of fusion candidates for encoding a video block of three-dimensional (3D) video data, in which the maximum number of fusion candidates in the list of fusion candidates is equal to 6 and the device for grouping a list of merger candidates comprises: a provision for determining whether the number of merger candidates on the list of merger candidates is less than 5; a device to derive, in response to the determination that the number of fusion candidates in the fusion candidate list is less than 5, one or more combined bi-predictive fusion candidates, in which each revealed bi-predictive fusion candidate of the bi-predictive merge candidate or candidates matches a respective : merge candidate pair already in the merge candidate list, where the respective combined bi-predictive merge candidate is a motion vector combination of a first candidate the fusion of the respective pair and motion vector of a second fusion candidate of the respective pair, where the motion vector of the first fusion candidate and the motion vector of the second fusion candidate refer to images in the image list of different reference; and a provision to include the combined bi-predictive merger candidate(s) in the list of merger candidates.
[14]
The video encoding apparatus of claim 13, wherein the apparatus for generating the list of fusion candidates further comprises: a means for including, in response to the determination that there are 5 fusion candidates in the list of candidates to the merger before adding any of the bi-predictive merger candidate or candidates to the merger candidate list, a candidate zero to the merger candidate list, where the motion vector components of candidate zero are equal to zero, and candidate reference index zero equals zero, The reference index indicating locations of a reference image in a list of reference images.
[15]
A video coding apparatus as claimed in claim 13, in which the generation of the list of fusion candidates takes place without checking any strain back view synthesis indicators (BVSP).
[16]
A video encoding apparatus according to claim 13, wherein the video encoding apparatus decodes the 3D video data and the video block "is a prediction unit (PU), the video encoding apparatus also comprising; ' a device for obtaining, from a bit stream, a syntax element indicating a fusion candidate selected in the list of fusion candidates; and a device for using the movement information of the selected candidate to generate predictive samples of the PU
[17]
The video encoding apparatus of claim 13, wherein the video encoding apparatus encodes the 3D video data and the video encoding apparatus comprises: a device for selecting a merge candidate from the candidate list to merger; and a device for signaling the position on the merger candidate list of the selected merger candidate.
[18]
The video encoding apparatus of claim 13, wherein:
generating the merge candidate list comprises deriving the combined bi-predictive merge candidate or candidates after entering an inter-view prediction motion vector (IPMVC), if available, in the merge candidate list, after performing a derivation process for spatial fusion candidates and after running a derivation process for a temporal fusion candidate, the derivation process for spatial fusion candidates derives and inserts up to four space motion vector candidates into the list of fusion candidates, and the derivation process for the temporal fusion candidate adds a temporal motion predictor candidate (TMVP), if available, to the list of fusion candidates.
"
[19]
19. A computer-readable data storage medium that has instructions stored therein which, when executed, cause a video encoding device to encode three-dimensional (3D) video data, the instructions causing the device to video encoding: Generate a merge candidate list to encode a video block of the 3D video data where the maximum number of merge candidates in the merge candidate list equals 6 and generate the merge candidate list comprises: determining whether the number of merger candidates on the merger candidate list is less than 5; and in response to the determination that the number of merger candidates on the list of merger candidates is less than 5: derive one or more combined bi-predictive merger candidates in which each respective combined bi-predictive merger candidate of the candidate or candidates the combined bi-predictive fusion corresponds to a respective pair of fusion candidates already in the fusion candidate list, where the respective combined bi-predictive fusion candidate is a motion vector combination of a first fusion candidate of the respective pair and motion vector of a second fusion candidate of the respective pair, wherein the motion vector of the first fusion candidate and the motion vector of the second fusion candidate refer to images in different reference image lists; and include the combined bi-predictive merger candidate or candidates on the list of merger candidates.
[20]
A computer-readable data storage medium according to claim 19, wherein generating the list of fusion candidates also comprises: in response to the determination that there are 5 fusion candidates in the list of candidates to the merger prior to adding any of the bi-predictive merger candidate or candidates to the merger candidate list, include candidate zero in the merger candidate list, where candidate zero's motion vector components are equal to zero , & Candidate reference index zero equals zero, The reference index indicating the location of a reference image in a list of reference images.
[21]
The computer-readable data storage medium of claim 19, in which the generation of the list of fusion candidates takes place without checking any back strain view synthesis indicators (BVSP).
[22]
A computer readable data storage medium according to claim 19.
in which the video block is a prediction unit (PU), the instructions also cause the video coding apparatus to: obtain, from a bit stream, a syntax element that indicates a merge candidate selected in the list of merger candidates; and use the selected candidate's movement information to generate predictive PU samples.
[23]
The computer-readable data storage medium of claim 29, wherein the instructions further cause the video encoding apparatus to: select a fusion candidate from the list of fusion candidates; and flag the position in the merger candidate list. of the selected merger candidate.
[24]
A computer readable data storage medium according to claim 19, in which the instructions cause the video encoding apparatus to derive the combined bi-predictive fusion candidate(s) after inputting a Interview Prediction Motion Vector (IPMVC) candidate, if available, in the list of fusion candidates after running a derivation process for spatial fusion candidates and after running a derivation process for a temporal fusion candidate , where the derivation process for spatial fusion candidates derives and inserts up to four spatial motion vectors into the list of fusion candidates, and where, the derivation process for the temporal fusion candidate adds a vector predictor candidate movement time (TMVP) if available to the list of merge candidates.

类似技术:

公开号 | 公开日 | 专利标题

BR112016008358A2|2021-08-03|Combined bi-predictive fusion candidates for 3d video encoding

KR101821030B1|2018-01-22|Inter-view residual prediction in multi-view or 3-dimensional video coding

EP3025500B1|2018-08-22|Sub-pu motion prediction for texture and depth coding

US9615090B2|2017-04-04|Parsing syntax elements in three-dimensional video coding

US9948915B2|2018-04-17|Sub-PU motion prediction for texture and depth coding

JP6522629B2|2019-05-29|Block-based advanced residual prediction for 3D video coding

JP2016524408A|2016-08-12|Parallel-derived disparity vectors for 3D video coding with adjacency-based disparity vector derivation

KR20150034173A|2015-04-02|Disparity vector selection in video coding

JP6407961B2|2018-10-17|Disparity vector refinement in video coding

JP6370891B2|2018-08-08|Advanced depth intercoding based on depth block disparity

ES2638416T3|2017-10-20|Advanced residual prediction | more accurate for texture coding

KR20160132891A|2016-11-21|Constrained depth intra mode coding for 3d video coding

JP2016526348A5|2017-06-22|

JP2017505025A|2017-02-09|Simple depth coding | signaling for depth intra prediction mode and depth inter prediction mode in 3D video coding

WO2015131387A1|2015-09-11|Simplified sub-prediction unit | motion parameter inheritence |

KR20150139953A|2015-12-14|Backward view synthesis prediction

EP3764643A1|2021-01-13|Image processing method based on inter prediction mode, and device therefor

WO2015135174A1|2015-09-17|Simplified disparity derived depth coding in three-dimensional | video coding

同族专利:

公开号 | 公开日

WO2015042399A1|2015-03-26|

EP3047649B1|2021-05-26|

KR101835240B1|2018-03-06|

US9554150B2|2017-01-24|

US20150085930A1|2015-03-26|

EP3047649A1|2016-07-27|

JP2016536817A|2016-11-24|

JP6151442B2|2017-06-21|

KR20160058809A|2016-05-25|

CN105580372A|2016-05-11|

CN105580372B|2019-05-28|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

CN101523919B|2006-10-12|2011-09-14|高通股份有限公司|Variable length coding table selection based on video block type for refinement coefficient coding|

US9066110B2|2011-03-08|2015-06-23|Texas Instruments Incorporated|Parsing friendly and error resilient merge flag coding in video coding|

US9521434B2|2011-06-09|2016-12-13|Qualcomm Incorporated|Internal bit depth increase in video coding|

KR20130050406A|2011-11-07|2013-05-16|오수미|Method for generating prediction block in inter prediction mode|

EP2842333A4|2012-07-02|2016-01-06|Mediatek Inc|Method and apparatus of inter-view candidate derivation in 3d video coding|

US20160134891A1|2013-04-23|2016-05-12|Samsung Electronics Co., Ltd.|Multi-viewpoint video encoding method using viewpoint synthesis prediction and apparatus for same, and multi-viewpoint video decoding method and apparatus for same|

KR101861497B1|2013-07-19|2018-05-28|에이치에프아이 이노베이션 인크.|Method and apparatus of camera parameter signaling in 3d video coding|US9693054B2|2010-12-22|2017-06-27|Lg Electronics Inc.|Intra prediction method and apparatus based on interpolation|

CN102883163B|2012-10-08|2014-05-28|华为技术有限公司|Method and device for building motion vector lists for prediction of motion vectors|

US9609347B2|2013-04-04|2017-03-28|Qualcomm Incorporated|Advanced merge mode for three-dimensionalvideo coding|

WO2015006984A1|2013-07-19|2015-01-22|Mediatek Singapore Pte. Ltd.|Reference view selection for 3d video coding|

KR102260146B1|2014-03-31|2021-06-03|인텔렉추얼디스커버리 주식회사|Method and device for creating inter-view merge candidates|

KR20160002344A|2014-06-30|2016-01-07|한국전자통신연구원|Apparatus And Method For Eliminating Redundancy Of View Synthesis Prediction Candidate In Merge Mode|

EP3202143B8|2014-11-18|2019-09-25|MediaTek Inc.|Method of bi-prediction video coding based on motion vectors from uni-prediction and merge candidate|

US10560718B2|2016-05-13|2020-02-11|Qualcomm Incorporated|Merge candidates for motion vector prediction for video coding|

US11172203B2|2017-08-08|2021-11-09|Mediatek Inc.|Intra merge prediction|

CN111567045A|2017-10-10|2020-08-21|韩国电子通信研究院|Method and apparatus for using inter prediction information|

WO2019216714A1|2018-05-10|2019-11-14|엘지전자 주식회사|Method for processing image on basis of inter-prediction mode and apparatus therefor|

US10469869B1|2018-06-01|2019-11-05|Tencent America LLC|Method and apparatus for video coding|

US10924731B2|2018-08-28|2021-02-16|Tencent America LLC|Complexity constraints on merge candidates list construction|

WO2020056143A1|2018-09-12|2020-03-19|Beijing Dajia Internet Information Technology Co., Ltd.|Modifications of the construction of the merge candidate list|

CN114125467A|2018-09-13|2022-03-01|华为技术有限公司|Decoding method and device for predicting motion information|

CA3112373A1|2018-11-20|2020-05-28|Huawei Technologies Co., Ltd.|An encoder, a decoder and corresponding methods for merge mode|

EP3884670A4|2018-11-22|2022-01-19|Huawei Tech Co Ltd|An encoder, a decoder and corresponding methods for inter prediction|

US11172214B2|2018-12-21|2021-11-09|Qualcomm Incorporated|Derivation of processing area for parallel processing in video coding|

CN112236996A|2018-12-21|2021-01-15|株式会社 Xris|Video signal encoding/decoding method and apparatus thereof|

US11234007B2|2019-01-05|2022-01-25|Tencent America LLC|Method and apparatus for video coding|

WO2020182147A1|2019-03-11|2020-09-17|Beijing Bytedance Network Technology Co., Ltd.|Improvement on motion candidate list construction|

EP3949418A1|2019-03-28|2022-02-09|Interdigital VC Holdings France|Inter-prediction parameter derivation for video encoding and decoding|

CN112135149A|2019-06-24|2020-12-25|华为技术有限公司|Entropy coding/decoding method and device of syntax element and codec|

WO2020251421A2|2019-10-03|2020-12-17|Huawei Technologies Co., Ltd.|Method and apparatus of high-level syntax for non-rectangular partitioning modes|

WO2021045659A2|2020-01-14|2021-03-11|Huawei Technologies Co., Ltd.|Method and apparatus of signaling the number of candidates for merge mode|

法律状态:
2021-08-17| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|

2021-08-17| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

优先权:

申请号 | 申请日 | 专利标题

US201361880737P| true| 2013-09-20|2013-09-20|

US61/880,737|2013-09-20|

US14/489,679|US9554150B2|2013-09-20|2014-09-18|Combined bi-predictive merging candidates for 3D video coding|

US14/489,679|2014-09-18|

PCT/US2014/056557|WO2015042399A1|2013-09-20|2014-09-19|Combined bi-predictive merging candidates for 3d video coding|

[返回顶部]